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Abstract 

The  influence  diagram  is  a  decision  analysis  tool.  It  shows  how  random  variables  and  decisions 
affect  (or  influence)  a  value  function.  The  influence  diagram  can  also  be  used  for  probabilistic 
analysis  by  ignoring  decision  variables  and  the  valu^  function.  In  this  case,  the  influence  diagram 
represents  a  joint  distribution  function  of  the  rand'-. .1  variables. 

The  influence  diagram  represents  a  joint  distribution  function  in  conditional  form,  with  one 
random  variable  in  unconditional  form,  a  second  random  variable  conditioned  on  the  first,  a  third 
conditioned  on  the  first  two,  and  so  on.  If  a  random  variable  is  conditioned  on  another,  then 
the  conditioning  order  can  be  reversed  with  Bayes’  rule.  In  this  manner,  any  random  variable 
can  eventually  be  expressed  in  unconditional  form,  or  a.s  being  conditioned  on  any  other  subset  of 
random  variables. 

Under  certain  conditions,  the  influence  diagram  can  represent  continuous  random  variables. 
When  the  random  variables  are  continuous  and  jointly  Gaussian,  then  each  random  variable  is 
completely  specified  by  its  mean  and  variance.  The  influence  diagram  calculates  the  conditional 
mean  and  conditional  variance  of  a  given  -a.  dom  variable,  given  a  subset  of  the  remaining  random 
variables  The  conditional  mean  and  variance  are  sufficient  to  completely  specify  the  conditional 
density  function.  The  conditioning  order  of  the  random  variables  may  be  changed  (again  using 
Bayes’  rule)  so  that  any  random  variable  can  be  conditioned  on  any  other  subset  of  random  variables 
in  the  diagram. 

The  discrete-time  Kalman  filter  is  a  conditional  mean  estimator  of  the  states  of  a  linear 
stochastic  process,  conditioned  on  the  previous  state  and  current  measurements  An  influence 
diagram  can  represent  the  states,  measurements,  and  initial  conditions  of  a  linear  stochastic  process. 
Under  these  conditions,  the  influence  diagram  is  also  a  conditional  mean  estimator  of  the  states  of 
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a  linear  stochastic  process.  The  influence  diagram  algorithm  becomes  an  alternative  algorithm  for 
discrete-time  filtering. 

Two  important  characteristics  of  any  algorithm  are  its  speed  and  numerical  properties.  The 
speed  of  the  algorithm  is  determined  by  the  number  and  type  of  mathematical  operations  required 
by  a  digital  computer  implementing  the  .'\lgorithm.  The  numerical  properties  of  an  algorithm 
depend  upon  how  it  handles  the  roundoff  errors  that  are  inherent  in  a  digital  computer.  These 
roundoff  errors  cause  the  computed  values  to  deviate  from  the  true  values  that  would  be  computed 
on  a  machine  with  no  roundoff  errors  (infinite  wordlength).  One  measure  of  numerical  properties 
is  the  stability  of  an  algorithm.  The  errors  caused  by  a  stable  algorithm  are  related  to  rounding 
errors  in  the  original  values  input  to  the  algorithm. 

The  speed  of  the  Kalman  filter  is  well  known.  The  numerical  properties  are  known  also,  but 
can  be  unsatisfactory.  The  conventional  Kalman  filter  algorithm  is  not  a  stable  algorithm  and  has 
numerical  precision  problems  under  certain  conditions.  One  solution  to  the  numerical  problems  of 
the  conventional  Kalman  filter  has  been  to  use  factored  forms  of  the  covariance  matrix  as  the  basis 
for  computation.  These  factored  forms  of  the  Kalman  filter  are  slower  than  the  conventional  form, 
but  allow  better  numerical  properties.  The  U-D  factored  form  of  the  filter  offers  the  best  numeric 
properties  with  reasonable  speed. 

This  research  focused  on  the  numeric  properties  and  the  speed  of  the  influence  diagram.  It 
revealed  that  the  influence  diagram  algorithm  for  discrete-time  filtering  uses  a  factored  form  of  the 
covariance  matrix.  This  factored  form  is  essentially  a  mirror-image  of  the  factorisation  used  by  the 
U-D  filter. 

Previous  research  showed  that  the  speed  of  the  influence  diagram  was  equivalent  to  the  U-D 
filter.  However,  this  research  revealed  circumstances  that  allow  significant  savings  in  the  number 
of  operations  required  by  the  influence  diagram.  These  savings  do  not  make  the  influence  diagram 
faster  than  the  Kalman  filter,  but  they  do  make  it  faster  than  other  factored  forms. 

vii 


This  research  also  showed  a  link  between  the  influence  diagram  algorithm  and  matrix  op¬ 
erations.  These  matrix  operations  are  known  to  be  stable  and  demonstrate  the  stability  of  the 
influence  diagram  algorithm.  The  errors  caused  by  the  influence  diagram  are  directly  related  to  the 
types  of  errors  caused  by  the  inversion  of  a  unit  triangular  matrix.  These  errors  are  very  similar  to 
the  errors  that  occur  in  the  U-D  filter. 

The  influence  diagram  allows  a  pictorial  view  of  the  conditional  mean  estimator.  It  can  be 
used  for  discrete-time  filtering,  resulting  in  excellent  numeric  properties.  Although  the  influence 
diagram  is  not  as  fast  as  the  Kalman  filter,  it  appears  to  offer  the  best  trade  of  speed  for  numeric 
precision  of  any  known  discrete-time  filtering  algorithm. 
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AN  ALTERNATIVE  ALGORITHM  FOR  DISCRETE-TIME  FILTERING 


I.  Introduction 

The  influence  diagram  is  a  tool  for  decision  analysis  [10, 11, 12, 14].  Il  is  a  pictorial  description 
of  conditional  relationships  between  a  given  set  of  random  variables.  If  one  random  variable  is 
conditioned  on  another,  then  the  conditioning  order  can  be  reversed  by  using  Bayes’  rule.  The 
influence  diagram  can  be  used  to  show  changes  in  the  relationships  between  the  random  variables 
when  the  conditioning  order  changes. 

Associated  with  the  influence  diagram  are  the  conditional  and  u.iconditional  probability  dis¬ 
tributions  for  each  of  the  random  variables  in  the  diagram.  Taken  together,  the  conditioning 
relationships  and  the  associated  distributions  are  sufficient  to  describe  a  joint  distribution  function 
for  the  given  set  of  random  variables. 

If  all  the  random  variables  of  a  set  are  continuous  and  jointly  Gaussian,  then  the  joint  density 
function  can  be  fully  specified  by  a  mean  vector  and  a  covariance  matrix.  For  jointly  Gaussian 
continuous  random  variables,  the  influence  diagram  can  be  used  as  an  alternative  expression  for 
the  joint  density  function.  Instead  of  a  covariance  matrix,  the  influence  diagram  specifies  the 
unconditional  and  conditional  variances  of  the  random  variables,  and  the  conditional  relationships 
between  them. 

The  discrete-time  Kalman  filter  is  an  optimal  estimator  for  the  states  of  a  linear,  stochastic 
system  [7].  It  can  be  derived  by  assuming  that  state  estimates  can  be  expressed  as  jointly  Gaussian 
random  variables.  Using  linear  operations  and  Bayes’  rule,  the  estimate  of  the  state  at  any  time 
can  be  calculated  as  a  vector  of  conditional  means  and  a  conditional  covariance  matrix,  conditioned 
on  current  and  previous  measurements. 
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The  linear  operations  of  a  linear  stochastic  system  can  also  be  expressed  in  influence  diagram 
form.  Furthermore,  the  influence  diagram  can  be  used  to  represent  the  conditional  vector  of  means 
and  conditional  covariance  matrix,  conditioned  on  previous  and  current  measurements,  or  equiva¬ 
lently,  the  most  recent  previous  state  estimates  and  the  current  measurements.  Thus,  the  influence 
diagram  affords  an  alternative  algorithm  for  discrete-time  filtering. 

One  limitation  of  ihe  Kalman  filter  is  that,  under  certain  circumstances,  it  can  have  prob¬ 
lems  with  numeric  accuracy  due  to  fixed  computer  wordlength  and  roundoff  error  [2,  7].  There  are 
various  ways  of  addressing  this  type  of  problem.  A  simple,  but  perhaps  costly  way  is  to  increase 
the  wordlength  of  the  computer.  Another  way  is  to  use  alternative  algorithms  which  have  better 
numerical  properties  than  Kalman’s  original  equations,  with  an  associated  increase  in  computa¬ 
tional  requirements.  One  alternative  algorithm  which  offers  a  large  benefit  in  numeric  accuracy,  at 
a  moderate  increase  in  computational  burden,  is  the  U-D  factored  form  of  the  Kalman  filter  [2]. 
Because  of  these  properties,  the  U-D  filter  will  be  a  baseline  for  comparison  in  the  remainder  of 
this  thesis. 

1.1  Summary  of  Research 

The  Gaussian  influence  diagram  draws  from  different  fields  of  expertise  [1,  6,  8,  9,  10,  11,  12, 
17].  It  is  a  graphical  method  of  showing  the  conditional  means  and  variances  of  jointly  Gaussian 
random  variables.  It  represents  the  relationship  between  such  variables  as  the  conditioning  of  one 
on  another.  The  algorithm  for  manipulatin.^  the  Gaussian  influence  diagram  takes  advantage  of 
the  special  characteristics  of  the  joint  Gaussian  distribution. 

The  theory  behind  the  continuous  random  variable  form  of  the  influence  diagram  dates  from 
early  work  in  statistical  analysis.  Some  of  the  notation  for  the  influence  diagram  can  be  traced  to 
Yule’s  notation  for  partial  regression  coefficients  of  multiple  variables  [17].  The  influence  diagram 
depicts  the  relationships  between  conditional  random  variables  and  those  random  variables  upon 
which  they  are  conditioned.  Either  Mood  and  Graybill  [8:pp.  198-215]  or  Anderson  [l:pp.  5-34] 
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present  a  thorough  summary  of  the  characteristics  of  conditional  Gaussian  random  variables.  Some 
of  the  applicable  properties  are  repeated  here  for  continuity. 


If  X  is  a  Gaussian  random  vector  that  has  components  xi,X2, ..  .,Xp,  and  the  components 
xi  through  X,  are  conditioned  on  the  components  x,+i  through  Xp,  then  the  mean  of  any  of  the 
conditioned  variables  can  be  expressed  as  a  linear  function  of  the  realizations  of  the  conditioning 
variables.  This  linear  relationship  can  be  expressed  in  terms  of  a  matrix  of  regression  coefficients.  If 
X,  is  a  component  of  the  conditioned  set  of  variables,  and  Xj  is  a  component  of  the  conditioning  set 

of  variables,  then  the  i,  jth  element  of  this  matrix  is  . pi  where  the  subscript 

implies  that  this  scalar  is  the  partial  derivative  of  x,’s  conditional  mean  with  respect  to  Xj’s  value. 
The  subscript  terms  after  the  period  are  the  conditioning  variables  for  the  coefficient. 


The  matrix  of  regression  coefficients  is  closely  related  to  the  covariance  matrix.  If  a  Gaussian 
random  vector  is  partitioned  into  two  random  vectors  x  and  y,  then  the  covariance  matrix  can  be 
partitioned  as 


P  = 


Py*  Pyy 


(1) 


The  matrix  of  regression  coefficients  of  x  on  y  is  PiyP“y^  If  nij:  represents  the  unconditional 
mean  vector  for  x,  and  iHy  represents  the  unconditional  mean  vector  for  y,  then  the  conditional 
mean  of  x  given  y  is  given  by  the  equation  (also  called  the  regression  function): 


mi|y  =  m,  +  PiyPyy  (p  “  Uly  ) 


(2) 


where  p  is  a  dummy  variable  for  the  realization  of  the  random  vector  y.  Furthermore,  the  condi¬ 
tional  variance  of  x  given  y  is: 

Pxly  —  Pri  “  P Pyjf  P V®  (^) 


These  equations  show  that,  for  a  given  set  of  conditioning  variables,  the  matrix  of  regression 
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coefficients  and  the  conditions*  covariance  matrix  are  fixed.  Furthermore,  the  conditional  mean  is 
a  linear  function  of  the  realizations  of  the  conditioning  random  variables  [l:pp.  27-30]. 

In  his  1986  doctoral  dissertation,  Kenley  proved  that  the  covariance  matrix  for  a  Gaussian 
random  vector  could  be  expressed  in  terms  of  a  matrix  of  regression  coefficients  and  a  matrix  of 
conditional  variances  [6:pp.  29-31].  Kenley’s  regression  coefficient  matrix  is  not  constructed  in  the 
same  way  as  the  matrix  PxyPyy  given  above.  Instead  of  using  the  same  conditioning  variables  (the 
random  vector  y),  for  all  the  regression  coefficients,  Kenley  constructed  his  matrix  in  the  following 
way. 

Assume  one  variable  is  chosen  from  the  random  vector.  The  choice  may  be  arbitrary,  but 
for  the  purposes  of  this  explanation,  it  will  be  labeled  xi .  A  second  variable,  perhaps  arbitrary 
also,  is  chosen  and  expressed  in  conditional  form  on  the  first  variable.  This  second  variable  will  be 
labeled  xa-  In  order  to  express  this  second  variable  in  conditional  form,  it  is  sufficient  to  calculate 
the  regression  coefficient  and  the  conditional  variance  of  xj  given  xi.  A  third  variable  is 
chosen  and  conditioned  on  the  first  two  using  the  regression  coefficients  Pxni  and  ■  The 

variance  of  the  third  variable  is  also  expressed  as  a  conditional  variance,  conditioned  on  the  first 
two  random  variables.  The  process  continues  until  all  random  variables  in  the  vector  have  been 
chosen  and  conditioned  on  the  previously  chosen  variables. 

The  regression  coefficients  just  calculated  can  be  expressed  in  matrix  form  as  a  strictly  upper 
triangular  matrix  B.  Kenley  uses  conventional  row-column  notation  for  this  matrix.  Kenley’s  new 
notation  is  more  convenient  for  influence  diagrams  because,  as  will  be  shown  later,  there  is  no 
need  to  keep  track  of  the  conditioning  variables  with  subscripts.  The  influence  diagram  graphically 
depicts  all  conditioning  variables.  In  addition,  the  order  of  the  first  two  numbers  in  the  subscript  is 
reversed  to  make  the  notation  compatible  with  matrix  operations.  The  B  matrix  can  be  expressed 
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using  either  notation  as: 


0  /?21 

031 

0A1 

•  •  •'  0nl 

0  bi2 

6i3 

614  • 

bln 

0 

032.1 

042.1 

•  •  •  0n2.1 

0 

623 

^24  • 

■  b2n 

0 

/343.21 

■  •  •  0n3.21 

0 

634  • 

•  bsn 

0 

0 

• 

0 

0 

0 

0 

The  conditional  variances  can  also  be  expressed  in  matrix  form.  This  matrix  is  diagonal, 
ordered  as  the  variables  were  chosen  earlier,  xi,X2,X3, The  conditioning  variables  for  any 
conditional  variance  lie  above  it  on  the  diagonal.  Since  Xi  is  first,  it  is  not  conditioned  on  any 
other  variables  and  the  variance  is  in  unconditional  form.  The  notation  for  the  variance  of  the  ith 
random  variable  is  whether  the  variable  is  expressed  in  conditional  or  unconditional  form.  The 
diagonal  matrix  D  is  expressed  as: 


D  = 


Ul 


V2  0 

V3 

0 


Vn 


(5) 


The  influence  diagram  was  introduced  in  decision  theory  for  visualizing  the  “influence”  of 
random  variables  on  decisions  [10,  11].  It  graphically  depicts  the  relationships  between  random 
variables,  deterministic  variables,  decisions,  and  value  functions.  For  this  research  however,  only 
random  and  deterministic  variables  will  be  considered.  Influence  diagrams  are  relatively  new  to  the 
field  of  decision  theory  and  are  not  widely  used  outside  of  the  field.  In  order  to  make  the  rest  of  this 
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research  easier  to  understand,  Chapter  2  is  an  introduction  to  influence  diagrams  as  specifically 
applied  to  probabilistic  analysis  and  discrete-time  filtering. 

Initially,  the  random  variables  in  influence  diagrams  were  assumed  to  be  discrete  random 
variables  or  deterministic  variables,  each  with  discrete  probability  distribution  functions  [10].  Ken- 
ley  later  derived  the  mathematics  for  influence  diagrams  with  continuous  random  variables.  He 
specifically  applied  influence  diagrams  to  the  case  of  jointly  Gaussian  random  variables,  although 
he  also  extended  the  analysis  to  other  continuous  random  variables  [6:pp.  12-34]. 

Kenley  also  demonstrated  that  the  influence  diagram  can  be  used  for  discrete-time  filtering, 
and  is  equivalent  to  the  discrete-time  Kalman  filter  [6:pp.  52-106].  He  showed  that,  when  the 
influence  diagram  is  used  for  discrete-time  filtering,  it  requires  about  the  same  number  of  floating 
point  operations  as  the  U-D  factored  form  of  the  Kalman  filter  [6:pp.  89-106]. 

The  discrete-time  Kalman  filter  is  known  to  have  problems  with  roundoff  error  and  numeric 
instability  when  implemented  on  fixed-wordlength  computers  [2,  4,  5].  Furthermore,  the  U-D 
covariance  factorization  algorithm  has  been  shown  to  have  better  numerical  properties  than  the 
conventional  Kalman  filter  algorithm  [2,  4].  However,  to  the  author’s  knowledge,  there  has  been 
no  research  into  the  numeric  properties  of  the  influence  diagram.  Specifically,  there  has  been  no 
comparison  of  the  numeric  properties  of  the  influence  diagram  algorithm  for  discrete-time  filtering 
to  the  conventional  Kalman  filter  or  to  the  U-D  factored  form  of  the  Kalman  filter. 

This  research  will  investigate  properties  of  the  influence  diagram  algorithm  and  compare  them 
to  the  properties  of  the  U-D  filter.  Specifically,  this  research  will  compare  the  efficiency  and  numeric 
properties  of  these  two  algorithms.  The  numeric  properties  of  the  influence  diagram  algorithm  will 
be  shown  by  relating  them  to  matrix  operations.  The  numerical  stability,  properties,  and  error 
bounds  of  these  matrix  operations  come  from  Wilkinson  [15,  16]. 
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1.2  Thesis  Objectives 


This  thesis  is  intended  to  build  upon  the  research  of  Kenley.  It  will  explore  the  properties  of 
the  influence  diagram,  specifically  as  applied  to  discrete- time  filtering.  Additionally,  the  influence 
diagram  algorithm  will  be  compared  to  both  the  conventional  Kalman  filter  algorithm  and  the  U-D 
filter.  The  results  will  be  a  comparison  of  the  advantages  and  disadvantages  of  each  filter  algorithm. 

There  are  two  reasons  for  using  the  U-D  filter  for  comparison.  One  reason  is  the  similarity 
between  the  U-D  filter  and  the  influence  diagram.  This  similarity  will  be  a  topic  later  in  the  thesis. 
The  other  reason  is  that  the  U-D  filter  represents  a  standard  in  terms  of  numerical  accuracy  and 
computational  loading.  Its  properties  are  well  researched  and  documented  [2,  3,  4,  5]. 

This  analysis  of  the  influence  diagram  will  begin  with  an  overview  of  its  principles  and  purpose. 
Even  though  the  influence  diagram  can  be  used  for  decision  analysis  and  maximizing  (or  minimizing) 
multivariate  value  functions,  these  aspects  will  not  be  discussed.  Instead,  this  thesis  will  address 
only  the  probabilistic  applications  of  the  influence  diagram.  The  discussion  will  be  a  review  of  the 
work  of  Schacter,  Kenley,  and  Tatman  [10,  11,  6,  12,  14). 

The  main  application  of  the  influence  diagram,  for  the  purposes  of  this  research,  is  an  alter¬ 
native  algorithm  for  discrete-time  Kalman  filtering.  For  this  reason,  there  will  be  a  brief  discussion 
of  the  theory  of  the  Kalman  filter,  paralleled  with  an  explanation  of  how  the  influence  diagram 
implements  the  same  operations.  Again,  this  explanation  will  be  a  review  of  previous  work  by 
Kenley  [6:pp.  52-106]. 

The  main  body  of  this  research  will  follow  the  review  and  will  focus  on  three  major  areas. 
The  first  topic  will  be  a  discussion  of  how  the  influence  diagram  can  be  solved  efficiently.  It  will  be 
a  demonstration  of  practical  methods  for  implementation.  A  result  of  this  demonstration  will  be 
a  tally  of  the  mathematical  operation  needed  for  the  complete  filter.  It  will  be  shown  that,  under 
certain  conditions,  the  operation  count  can  be  reduced  from  previous  estimates  [6:pp.  89-106]. 


7 


More  importantly,  the  algorithm  lends  itself  to  a  pipeline  architecture  which  can  lead  to  increased 
calculation  speed. 

The  second  major  topic  will  be  a  comparison  of  the  influence  diagram  with  i;  latrix  operations. 
The  matrix  operations  add  insight  into  the  mathematics  involved  and  are  easier  to  analyze.  This 
relationship  to  matrix  operations  forms  the  basis  for  the  third  major  topic,  numerical  properties. 

To  this  author’s  knowledge,  there  has  been  no  previous  literature  on  the  numerical  properties 
of  the  influence  diagram.  Neither  has  the  influence  diagram  been  included  in  any  experimental 
comparisons  of  different  Kalman  filter  algorithms.  No  such  experimental  comparisons  were  made 
in  this  research.  Instead,  this  research  explores  the  theoretical  numerical  properties,  based  on  the 
known  numerical  properties  of  the  matrix  operations. 

There  are  some  important  results  from  the  theoretical  research  in  this  thesis.  First,  it  will  be 
shown  that  the  influence  diagram  algorithm,  like  the  U-D  factored  filter,  uses  a  stable  algorithm  to 
calculate  the  means  and  conditional  variances  of  random  variables.  Second,  the  numeric  properties 
of  this  algorithm  can  be  compared  to  the  properties  of  the  U-D  filter.  Third  will  be  an  insight  into 
the  conditions  that  do  cause  numerical  problems  for  the  influence  diagram. 

1.3  Thesis  Overview 

This  chapter  reviewed  some  of  the  existing  research  in  influence  diagrams  and  their  use  as  an 
algorithm  for  discrete- time  filtering.  It  also  indicated  the  outline  for  the  rest  of  this  thesis.  Chapter  2 
will  be  a  tutorial  approach  to  influence  diagrams.  It  will  attempt  to  explain  the  applications  and 
operation  of  the  influence  diagram,  specifically  its  application  to  discrete-time  filtering.  Chapter  3 
will  demonstrate  efficient  implementation  of  the  influence  diagram,  including  operation  counts. 
Chapter  4  will  demonstrate  the  relationship  of  influence  diagram  operations  to  matrix  operations. 
This  will  lead  in  to  Chapter  5,  which  will  be  a  description  of  the  numerical  properties.  Finally, 
Chapter  6  will  conclude  and  make  recommendations  for  future  research. 
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II.  Influence  Diagrams 


The  influence  diagram  is  a  recent  tool  in  decision  analysis  and  is  not  well  kiiown  beyond 
this  field.  This  chapter  is  a  short  description  of  the  influence  diagram,  concentrating  on  its  use 
for  probabilistic  analysis.  For  a  more  rigorous  description  of  influence  diagrams,  refer  to  Tatman, 
Schacter,  or  Kenley  [6,  10,  11,  14]. 

An  influence  diagram  represents  deterministic,  random,  decision,  and  value  function  variables 
by  using  a  node  for  each  variable.  For  probabilistic  analysis,  only  two  types  of  nodes  are  necessary. 
A  single  circle  node  represents  a  random  variable.  A  double  circle  node  represents  a  deterministic 
variable  or,  in  other  words,  a  variable  that  can  be  calculated  exactly  as  a  function  of  other  variables. 

In  general,  a  random  variable  can  have  a  discrete  probability  distribution,  a  continuous  prob 
ability  density,  or  a  combination  of  the  two.  Initially,  influence  diagrams  were  only  used  for  random 
variables  with  discrete  probability  distributions;  therefore,  this  discussion  will  begin  with  this  case 
also. 

S.l  Discrete  Random  Variables 

Assume  two  random  variables,  x  and  y,  each  with  discrete  probability  distributions.  The  vari¬ 
able  X  can  take  on  discrete  values  xi, X2,®3, . . with  probabilities  p(xi),p(x2),p(i:3), . .  .,p(x„) 
such  that  P[x  =  aij]  =  p(x,).  The  probabilities  for  the  random  variable  y  can  be  similarly  defined 
as  P[y  =  yf]  =  p(i/j)  where  the  random  variable  y  can  take  on  the  discrete  values  yi,2/2iJ/3.  •  •  -  i I/m- 

For  two  such  random  variables,  a  joint  distribution  can  be  defined.  This  joint  distribution 
assigns  a  probability  to  each  discrete  pair  of  values  (realizations)  of  the  random  variables.  The 
two  random  variables  can  be  associated  with  a  random  vector  where  the  elements  of  the  vector  are 
the  individual  random  variables.  The  joint  distribution  for  these  random  variables  becomes  the 
probability  distribution  for  the  two  dimensional  random  vector.  For  each  pair  of  discrete  values 
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Xi  and  yj,  the  probability  P[x  =  a:,-,y  =  yj]  =  p{xi,yj)  is  the  probability  that  x  and  y  take  on  the 
values  a:,-  and  y,- . 

A  joint  distribution  function  of  two  variables  can  also  be  expressed  in  terms  of  the  conditional 
distribution  of  one  random  variable,  conditioned  on  the  outcome  of  the  other  random  variable. 
Bayes’  rule  demonstrates  the  mathematics  associated  with  this  expression.  One  form  of  Bayes’  rule 
is  p{xi,  yj)  =  p(K<|y;)p(yj'),  where  p(a:j|yj)  means  the  probability  that  x  takes  on  the  value  Xi,  given 
that  y  has  taken  on  the  value  y^.  The  joint  probability  distribution  can  also  be  expressed,  again 
using  Bayes’  rule,  in  terms  of  the  conditional  distribution  of  y  given  a  value  of  x.  This  is  written 
asp(a:<,yj)  =  p(yj|a:.)p(x,). 

The  probability  p{yj)  can  be  calculated  by  summing  the  probabilities  of  p(xi,  yj)  over  all  pos¬ 
sible  values  of  Xi.  This  distribution  is  also  referred  to  as  the  marginal  or  unconditional  probability 
distribution  of  y  and  can  be  represented  in  equation  form  as: 

p(yj)  =  =  X^p(xilyj)p(yj)  = 

*•  »j 

The  marginal  probability  distribution  of  x  could  be  computed  similarly,  except  that  the  summation 
would  be  over  all  possible  values  of  y. 

The  influence  diagram  represents  the  joint  probability  distribution  using  the  conditional  form 
of  Beyes’  rule.  For  example,  the  joint  probability  distribution  for  the  previously  described  two 
dimensional  random  vector  is  shown  in  the  first  influence  diagram  in  Figure  1.  In  this  diagram, 
?.n  arrow  points  from  node  y  to  node  x.  The  arrow  implies  that  y  is  specified  as  the  marginal 
distribution  p(y,),  and  that  x  is  expressed  in  terms  of  the  conditional  distribution  pixilyj).  The 
second  influence  diagram  in  Figure  1  depicts  the  same  joint  probability  distribution  in  terms  of  the 
marginal  distribution  of  x  and  the  conditional  distribution  of  y  given  x. 


10 


If  the  joint  probability  distribution  is  specified  as  in  the  first  influence  diagram  in  Figure  1 , 
then  it  can  be  converted  to  the  second  diagram  by  reversing  the  arrow.  Mathematically,  the  correct 
distributions  can  be  calculated  with  Bayes*  rule.  These  operations  are 


p(«.-)  =  y^p(x.- !%•)?(%) 

Vi 


(7) 


(8) 


Figure  1.  Influence  Diagram  for  the  Joint  Distribution  ofp(xj,t/,) 


The  mathematics  in  Equations  (7)  and  (8)  are  not  shown  explicitly,  but  are  implied  by 
the  diagram.  This  attribute  makes  the  influence  diagrams  especially  attractive  for  computation 
by  digital  computer.  The  influence  diagram  is  the  pictorial  representation  of  the  relationships 
between  the  variables.  The  digital  computer  accomplishes  the  mathematics  needed  for  computing 
the  underlying  distribution  functions. 

An  influence  diagram  with  only  two  nodes  (random  variables)  is  a  simplistic  case.  More 
commonly,  there  will  be  many  random  variables.  Conditional  distributions  are  associated  with 
those  nodes  having  an  arrow  pointing  to  them  from  other  nodes.  A  conditional  random  variable  may 
be  conditioned  on  any  number  of  the  remaining  variables  in  the  diagram,  with  certain  restrictions. 
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Taken  together,  all  of  the  random  variables  in  a  diagram  form  a  random  vector.  The  conditional 
and  unconditional  distributions  of  these  random  variables  represent  the  joint  distribution  for  the 
ramdom  vector. 

As  an  example,  consider  the  joint  distribution  of  three  random  variables  with  probabilities 
expressed  in  terms  of  ordered  triples,  P[x  =  Sj,y  =  y,-,z  =  Zi]  =  p{xi,yj,Zk).  This  joint  distribution 
can  be  expressed  in  terms  of  the  marginal  distribution  of  one  variable,  the  conditional  distribution 
of  a  second  variable  given  the  first,  and  the  conditional  distribution  of  the  third,  given  the  first  two. 
The  overall  joint  distribution  remains  unchanged,  but  it  is  expressed  in  a  different  form.  This  can 
be  shown  mathematically  as  p{xi,yj,Zk)  =  pixi)p{yj\xi)p{zk\xiyyj)  and  in  influence  diagram  form 
in  Figure  2. 


The  following  is  some  terminology  relating  to  the  influence  diagram.  The  nodes  which  have 
arrows  pointing  to  another  node  are  called  the  conditional  or  direct  predecessors  of  that  node.  A 
predecessor  node  may  itself  have  a  predecessor,  and  that  predecessor  may  have  one  also.  The  set 
of  all  predecessors  of  a  node,  both  direct  and  indirect,  is  the  set  of  weak  predecessors.  Similarly,  a 
node  with  a  conditional  distribution  is  called  a  successor  node.  A  list  of  nodes  is  called  ‘ordered’  if 
none  of  the  weak  predecessors  of  a  node  follow  the  node  in  the  list.  An  influence  diagram  is  defined 
to  be  an  ordered  list  of  nodes,  corresponding  to  a  unique  joint  distribution  function.  If  a  sequence 
of  nodes  cannot  be  ordered,  then  a  cycle  is  said  to  exist. 

The  influence  diagram  in  Figure  2  demonstrates  the  concept  of  an  ordered  list.  The  nodes 
can  be  ordered  as  x,  y,  and  z.  Node  y  follows  node  x  in  the  list,  and  node  y  is  not  a  predecessor 
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of  X.  Similarly,  node  z  follows  both  nodes  x  and  y,  and  z  is  not  a  predecessor  of  either.  The 
influence  diagram  requirement  for  an  ordered  list  is  derived  from  the  conditioning  of  Bayes’  rule. 
Conceptually,  if  one  node  is  conditioned  on  a  predecessor  node,  then  that  predecessor  node  may 
not  be  conditioned  on  the  successor,  either  directly  or  indirectly.  To  do  so  would  imply  circular 
conditioning. 

Transformations  can  change  the  appearance  of  the  influence  diagram  and  affect  the  conditional 
distributions  underlying  the  diagram.  For  the  purpose  of  this  thesis,  these  transformations  are 
limited  to  the  arc  reversal  and  the  elimination  of  ‘barren’  or  ‘nuisance’  nodes.  These  operations 
will  be  explained  by  example. 

S.S  Example  of  Dii,:reie  Random  Variables 

Assume  tiiree  random  variables  described  as  follows.  Random  variables  x,  y,  and  z  represent 
the  probability  of  failure  of  components  x,  y,  and  z  in  a  machine  after  a  given  amount  of  time.  It  is 
known  that  a  failure  of  either  component  x  or  z  induces  a  higher  rate  of  failure  in  component  y.  It 
is  possible  to  determine  if  components  y  and  z  have  failed  by  direct  measurement,  but  component 
X  must  be  removed,  a  costly  procedure.  The  problem  is  to  calculate  the  probability  of  failure  of 
component  x,  given  the  conditions  of  components  y  and  z.  Known  probabilities  of  failure  after  a 
given  period  of  time  are  listed  in  Table  1. 

The  influence  diagram  in  Figure  3  depicts  this  three-variable  example.  The  random  variables 
X  and  z  are  expressed  as  marginal  densities.  They  are  not  conditioned  on  any  other  random 
variables.  The  random  variable  y  is  conditioned  on  both  x  and  z.  This  means  that  y  is  conditioned 
on  the  random  vector  (x,z).  This  diagram  also  shows  x  and  z  to  be  independent  by  the  lack 
of  an  arc  between  them.  The  influence  diagram  does  not  show  the  actual  distributions  of  the 
random  variables.  It  is  simply  a  pictorial  method  of  bookkeeping;  the  influence  diagram  shows 
the  conditioning  order  of  the  random  variables.  Both  the  influence  diagram  and  the  tabular  data 
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associated  with  it  are  needed  to  express  the  overall  joint  distribution. 


Tab: 


le  1.  Initial  Distributions  for  Discrete  Influence  Diagram  Example 


P(z  =  0)  =  0.25 

P(2  =  1)  =  0.75 

X 

11 

o 

II 

o 

P(x  =  1)  =  0.4 

P{y=  1|2  =  0,x  =  0)  =  0.5 

P(y  =  0|z  =  0,x=  1)  =  0.25 

P(y=  l|2  =  0,x=  1)  =  0.75 

P(y  =  0|2=l,X  =  0)  =  0.2 

P(y=  ll2  =  l,x  =  0)  =  0.8 

P(y=l|2  =  l,x=l)  =  0.9 

Figure  3.  Influence  Diagram  for  Discrete  Random  Variable  Example 


The  influence  diagram  arrows  in  Figure  3  follow  the  direction  of  causality.  In  general,  the 
arrov  s  do  not  imply  causality.  In  this  case  however,  the  causal  factors  of  the  process  model  deter¬ 
mine  the  initial  probabilistic  relationships.  After  some  mathematical  calculations,  the  probabilistic 
relationships  will  be  expressed  in  a  different  form,  and  the  arrows  in  the  influence  diagram  will  not 
have  any  relationship  to  causality. 


The  desired  distribution  is  the  probability  that  component  x  has  failed,  given  the  conditions 
of  components  y  and  z.  This  distribution  is  given  in  Table  2  and  the  influence  diagram  in  Figure  4. 
The  probability  distribution  of  y  given  z  is  calculated  by  taking  the  summation  of  y  given  x  and 
z  over  all  values  of  x.  The  probability  of  x  given  y  and  z  is  calculated  by  using  Bayes’  rule.  The 
specific  equations  are: 

P(l/iM*)  =  (9) 


p(«i 


V{y}\xuZk)p{xi\zk) 

p(yjkit) 


(10) 
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But  because  x  and  z  are  independent,  p(*i|zifc)  =  ?(«<).  The  resultant  transformation  is  an  example 
of  arc  reversal.  The  result  is  the  probability  of  the  component  x  having  failed,  given  the  condition 
of  the  two  observed  components.  The  underlying  joint  distribution  is  unchanged. 


Table  2.  Conditional  Distribution  of  x 


P{z  =  0)  =  0.25 

P{z  =  1)  =  0.75 

P(y  =  0|z  =  0)  =  0.4 

P(y=  l|z  =  0)  =  0.6 

P(y  =  0|z  =  1)  =  0.16 

P(y=  l|z=  1)  =  0.84 

P(x  =  0!z  =  0,y  =  0)  =  0.75 

P(x=  llz  =  0,y  =  0)  =  0.25 

P(x  =  0|2  =  0,y  =  l)  =  0.5 

P(x=  l|2  =  0,y=  1)  =  0.5 

P(x  =  01z  =  l,y  =  0)  =  0.75 

P(x=  ll2=  l,y  =  0)  =  0.25 

P(x  =  0|z=:l,y  =  l)  =  0.571429 

P(x=l|z=l,y=l)  =  0.428571 

Figure  4.  Influence  Diagram  for  Conditional  Distribution  of  x 


An  important  characteristic  of  arc  reversal  is  that  the  both  nodes  involved  in  the  reversal 
inherit  each  others  direct  predecessors.  This  is  a  function  of  Bayes’  rule  as  seen  from  this  example. 
In  this  case,  node  x  inherits  the  predecessors  of  node  y,  namely  node  z.  Similarly,  if  another  node 
had  been  a  predecessor  of  node  x,  then  it  would  have  become  a  predecessor  of  node  y  after  the 
reversal.  This  can  be  seen  in  the  above  equations  by  placing  another  conditioning  variable  on  x. 
This  additional  conditioning  variable  would  have  also  shown  up  as  a  conditioning  variable  of  y  in 
Equation  (9). 

There  is  one  more  point  to  be  made  about  the  resulting  diagram  in  Figure  4.  It  is  not  now 
possible  to  reverse  the  arrow  from  node  z  to  node  x.  This  would  result  in  a  cycle  in  the  influence 
diagram,  and  the  diagram  could  no  longer  be  ordered.  In  general,  two  nodes  can  be  reversed  only  if 
they  can  be  placed  next  to  each  other  in  an  ordered  sequence  of  nodes.  If  node  z  were  to  be  moved 
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to  the  end  of  the  ordered  sequence,  then  it  must  be  done  without  causing  a  cycle.  For  the  example 
in  Figure  4,  nodes  y  and  x  would  be  reversed,  resulting  in  the  original  diagram  in  Figure  3.  Next, 
node  X  and  z  would  be  reversed.  Since  these  random  variables  are  independent,  no  mathematical 
operations  are  required  for  their  reversal.  Finally,  nodes  y  and  z  could  be  reversed  to  form  the 
diagram  in  Figure  5  and  the  distribution  in  Table  3. 


Table  3.  Conditional  Distribution  jf  z 


P(x  =  0)  =  0.6 

P(x  =  1)  =  0.4 

P(y=  l|x  =  0)  =  0.725 

P(y  =  0|x=  1)  =  0.1375 

P(y=:  1  lx  =  1)  =  0.8625 

P{z  =  0|x  =  0,y  =  0)  =  0.454545 

P(2=  l|x=:0,y  =  0)  =  0.545454 

P(2=l|x  =  0,y=l)  =  0.827586 

P(z  =  01x=  l,y  =  0)  =  0.454545 

P(2-l|x=l,y  =  0)  =  0.545454 

P(2  =  ojx  =  l,y  =  1)  =  0.217391 

P(z=l|x=l,y=l)  =  0.782609 

Figure  5.  Influence  Diagram  for  Discrete  Random  Variable  Example 


A  slight  change  in  this  second  example  illustrates  another  of  the  transformations,  the  elimi¬ 
nation  of  barren  nodes.  Assume  this  time  that  the  components  z  and  y  cannot  be  tested  directly.  A 
decision  to  replace  component  y  is  based  solely  on  the  condition  of  component  x.  The  desired  dis¬ 
tribution  is  the  probability  of  y  given  x.  The  condition  of  component  z  is  irrelevant  to  the  decision 
because  it  cannot  be  observed.. 

The  desired  distribution  is  already  present  in  the  top  part  of  Table  3.  The  influence  diagram 
is  shown  in  Figure  6.  In  this  case,  node  z  was  barren,  meaning  that  it  was  not  a  predecessor  of 
the  desired  nodes.  The  probabilistic  information  in  node  z  was  already  incorporated  into  node 
y  by  taking  the  summation  as  in  p{yj)  =  Node  z  is  no  longer  needed  and 
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Figure  6.  Discrete  Distribution  Influence  Diagram  with  Node  Removed 


can  be  removed.  However,  the  remaining  influence  diagram  does  not  represent  the  original  joint 
distribution.  It  represents  the  joint  distribution  of  the  subset  of  the  original  random  variables, 
corresponding  to  the  visible  nodes. 

To  summarize  in  simple  terms,  making  a  node  less  conditional  involves  taking  the  summation 
of  that  random  variable’s  conditional  distribution  over  one  of  its  direct  predecessors.  In  the  process, 
the  original  successor  node  inherits  all  conditional  predecessors  of  the  original  predecessor.  On  the 
other  hand,  making  a  node  more  conditional  requires  an  application  of  Bayes’  rule.  Specifically,  it 
needs  the  summation  just  calculated  as  the  denominator  term  for  Bayes’  rule.  The  new  successor 
will  inherit  all  conditional  predecessors  of  the  original  successor.  If  the  new  successor  node  is 
barren,  and  if  it  has  no  meaning  in  desired  result,  then  it  can  be  removed.  In  this  case,  Bayes’  rule 
is  not  needed  because  the  new  distribution  of  the  successor  node  will  be  discarded  when  the  node 
is  removed. 

A  deterministic  variable  is  one  that  is  a  function  of  only  the  outcome  of  other  variables. 
The  functional  relationship  between  one  variable  and  another  is  depicted  as  the  conditioning  of 
a  deterministic  variable  upon  those  variables  of  which  it  is  a  function.  For  this  reason,  in  the 
influence  diagram,  the  deterministic  variable  is  usually  shown  as  a  conditional  variable.  If  the 
conditioning  variables  for  a  deterministic  variable  are  random,  then  the  deterministic  variable  can 
be  expressed  as  a  random  variable  in  unconditional  form.  The  distribution  can  be  calculated  by 
taking  summations  as  before. 
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2.3  Continuous  Random  Variables 


The  influence  diagram  can  be  extended  to  the  case  of  continuous  random  variables.  If  the 
random  variables  are  Gaussian,  and  all  joint  densities  of  the  variables  are  Gaussian,  then  the 
variables  are  called  jointly  Gaussian.  Gaussian  random  vectors  are  easy  to  represent  because  the 
entire  probability  density  function  can  be  specified  with  only  a  vector  of  the  means  and  a  covariance 
matrix.  It  is  also  known  that  any  linear  combination  of  jointly  Gaussian  random  variables  is 
Gaussian.  Furthermore,  any  subset  of  jointly  Gaussian  variables  is  jointly  Gaussian  itself. 

The  Gaussian  influence  diagram  is  patterned  after  the  discrete  influence  diagram  described 
earlier.  It  consists  of  a  set  of  jointly  Gaussian  random  variables.  Each  node  represents  one  of 
the  random  variables  and  the  entire  influence  diagram  represents  the  joint  density  function  of  a 
Gaussian  random  vector.  The  rules  for  exchanging  nodes  and  for  node  removal  are  the  same  as  for 
the  discrete  form  of  the  influence  diagram.  Again,  the  influence  diagram  represents  the  conditioning 
relationships  while  the  actual  probability  density  functions  are  computed  separately. 

Another  example  will  show  the  simple  two- variable  case.  Assume  two  jointly  Gaussian  random 
variables  x  and  y  make  up  a  random  vector.  As  stated  before,  the  joint  density  function  of  the 
random  variables  can  be  specified  by  a  two-dimensional  vector  of  means  and  a  two-by-two  covariance 
matrix.  The  influence  diagram  expresses  the  joint  density  as  the  mean  and  variance  of  one  variable, 
along  with  the  conditional  mean  and  conditional  variance  of  the  second,  given  the  first.  This 
conditional  density  is  Gaussian  because  any  conditional  density  of  jointly  Gaussian  random  variables 
is  Gaussian  also.  Either  representation  is  sufficient  to  specify  the  overall  joint  density  function. 

The  two  Gaussian  random  variables  are  x  and  y.  The  mean  vector  and  covariance  matrix  are: 

(11) 


m  = 


lix 
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where  /i*  and  /ly  are  the  unconditional  means  of  >:  and  y  respectively.  Also,  the  notation  Ejy  implies 
the  covariance  of  the  variables  x  and  y  and  is  equal  to  Sy*.  Finally,  Exr  represents  the  covariance 
of  variable  x  with  itself,  or  just  the  variance  of  x. 

The  conditional  mean  of  x  given  y  becomes  Hx\y  =  Mi  +  ~  l^y)>  where  p  is  a  dummy 

variable  representing  the  possible  realizations  of  the  variable  y.  The  quantity  in  parenthesis  is  the 
difference  of  the  predicted  (mean)  value  of  the  random  variable  and  its  realization.  This  term  will 
be  called  the  residual.  The  conditional  variance  of  x  given  y  is  Ej;|y  =  T^xx  ~  [7:pP-  110-111]. 

£ 

The  coefficient  =  /?j,y  has  special  significance.  It  is  called  the  regression  coefficient  of  x 
on  y.  It  represents  a  linear  change  in  the  conditional  mean  of  x,  given  the  realization  of  y.  The 
notation  j3xy  comes  from  Yule  [17].  The  order  of  the  subscripts  is  significant  for  the  regression 
coefficient  so  /?*y  ^  /?y*. 

The  notation  in  the  remainder  of  this  thesis  will  follow  the  convention  of  Kenley  rather  than 
Yule.  Kenley  uses  the  notation  byx  instead  of  Pxy  Also,  Kenley  does  not  explicitly  write  any 
other  conditioning  variables  in  the  coefficient  as  Yule  does.  Instead,  other  conditioning  variables 
are  implied  by  the  influence  diagram  [6]. 


The  regression  coefficient  is  related  to,  but  is  not  the  same  as  the  correlation  coefficient. 
The  correlation  coefficient  is  given  by  r^y  =  Vyx  =  The  regression  coefficient  can  be 


The  correlation  coefficient  is  given  by  r^y  =  Vyx  =  The  regression  coefficient  can  be 

V  V  "vv 

calculated  from  the  correlation  coefficient  by  the  relationship  byx  =  rgy  '^^SaL. 


Either  random  variable  could  have  been  chosen  to  be  represented  in  unconditional  form. 


If  the  order  of  conditioning  were  reversed,  then  the  conditional  mean  of  y  given  x  would  be 
My|r  =  My  +  “  Mi))  where  ^  is  now  the  dummy  variable  representing  the  possible  realiza- 

tions  of  the  variable  x.  The  conditional  variance  of  y  given  x  is  ^y|i  =  ^yy  -  The  term 


19 


^  =  bxy  is  called  the  regression  coefficient  of  y  on  x.  It  too  is  related  to  the  correlation  coefficient 
r^y,  this  time  by  the  relationship  bxy  = 

The  influence  diagreun  represents  the  joint  density  function  as  the  marginal  density  of  one 
variable  and  the  conditional  density  of  the  other.  Assume  an  influence  diagram  and  its  underlying 
density  as  shown  in  Figure  7.  In  this  case,  the  random  variable  y  is  given  in  unconditional  form,  and 
the  random  variable  x  is  expressed  in  conditional  form.  Only  five  values  are  needed  to  represent  the 
joint  distribution:  the  unconditional  means  of  the  two  random  variables,  the  unconditional  variance 
of  y,  the  conditional  variance  of  x,  and  the  regression  coefficient  of  x  on  y.  In  this  figure,  u,-  is  the 
variance  of  the  ith  node,  whether  it  is  in  conditional  or  unconditional  form.  Because  of  the  simple 
method  for  describing  the  joint  density,  these  values  are  shown  directly  on  the  influence  diagram 
rather  than  in  a  separate  table.  The  means  and  variances  are  associated  with  the  appropriate  node, 
while  the  regression  coefficients  are  associated  with  the  arrow  between  the  nodes. 


As  with  the  discrete  form  of  the  influence  diagram,  it  is  possible  to  reverse  the  arrow  between 
the  two  nodes.  This  is  equivalent  to  changing  the  order  of  conditioning  using  Bayes’  rule.  The 
mathematics  for  the  Gaussian  form  of  the  influence  diagram  are  different  however. 

If  the  unconditional  density  of  x  is  desired,  then  only  the  unconditional  variance  of  x  must 
be  computed  (the  unconditional  mean  already  exists).  The  original  calculations  for  the  conditional 
variance  can  be  used  to  give  E^iy  =  “  ^yx^yy  equation  can  be  rearranged 

to  give 

Exr  =  "h  ^yx^yy  (^3) 
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The  second  part  of  the  reversal  of  these  two  nodes  is  the  calculation  of  the  conditional  density 
of  y.  The  only  values  that  need  to  be  calculated  are  the  conditional  variance  and  the  regression 
coefficient  ofy  on  x.  The  conditional  variance  ofy  given  x  was  already  shown  to  be  Ey|j;  =  — 

This  equation  can  be  simplified  as  follows: 


y|®  ~ 


Zjxx 

^yy^ir  ~  ^ry 


Ey|i  — 


Syy(Sgj 

^yy^x|y 


(14) 


Similarly,  the  regression  coefficient  of  y  on  x  is  calculated  by 


1  _  ^xy  _  '^yx^yy 

*‘'“Sxx“  Ex 


(15) 


The  result  is  that  the  joint  density  function  is  expressed  as  an  unconditional  density  of  x  and  a 
conditional  density  of  y.  This  is  shown  in  the  influence  diagram  in  Figure  8. 


Figure  8.  Continuous  Gaussian  Influence  Diagram  after  Reversal 


As  in  the  discrete  form  of  the  influence  diagram,  the  two-variable  case  is  simplistic.  The 
calculations  become  slightly  more  complicated  as  the  number  of  variables  increases.  If  there  arc 
n  random  variables  in  the  Gaussian  random  vector,  then  one  is  chosen,  perhaps  arbitrarily,  to 
be  represented  in  unconditional  form.  A  second  random  variable  is  conditioned  on  the  first  as  in 
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the  previous  example.  The  third  random  variable  is  represented  as  being  conditioned  on  the  first 
two.  There  will  be  two  regression  coefficients  associated  with  this  third  variable,  one  from  the 
first  variable  and  one  from  the  second.  The  process  continues  with  the  fourth  variable  conditioned 
on  the  previous  three  and  having  three  regression  coefficients,  etc.  Eventually,  the  nth  random 
variable  will  be  represented  as  a  conditional  density,  conditioned  on  the  previous  n  —  1  variables, 
with  a  regression  coefficient  associated  with  each.  The  result  can  be  written  in  the  form  of  an 
n-dimensional  infiuence  diagram.  Using  simplified  notation  where  /*„  implies  /r„(^n).  the  diagram 
shown  in  Figure  9  can  be  expressed  mathematically  as: 


fxi  ,X3,I3...E„  —  /xi/xajti/xslxi.xj  •  •  • /r„lr,,rj,r5.. 


x„-i 


(16) 


Figure  9.  Influence  Diagram  for  N  Jointly  Gaussian  Random  Variables 


Assume  two  variables  are  represented  by  two  nodes  in  an  influence  diagram.  If  the  nodes  can 
be  placed  next  to  each  other  in  some  ordered  sequence,  then  the  conditioning  order  can  be  reversed. 
Call  the  first  variable  Xj  and  the  second  Xj  where  it  is  assumed  that  x,-  is  a  conditional  predecessor 
of  Xj .  Furthermore,  assume  that  x/f  is  a  set  containing  the  union  of  all  direct  predecessor  nodes 
of  both  x,-  and  xj,  except  for  i  itself.  Also  assume  that  x*  is  an  arbitrary  element  of  xk-  Nodes  x,- 
and  Xj  have  variance  Vi  and  Vj  respectively.  The  term  is  the  regression  coefficient  of  j  on  i;  the 
terms  6*,  and  bkj  are  the  regression  coefficients  of  x,-  and  Xj  on  x*. 
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In  the  multivariable  case,  the  equations  for  calculating  the  new  variances  and  regression  co¬ 
efficients  are  similar  to  those  in  the  preceding  paragraphs  for  the  two-variable  case.  Additionally, 
both  nodes  will  inherit  each  others  direct  predecessors,  and  the  regression  coefficients  from  prede¬ 
cessor  nodes  must  be  adjusted  to  reflect  the  new  conditioning  order.  The  equations  for  calculating 
the  new  variances  and  regression  coefficients  are  as  follows,  where  the  prime  symbols  represent  a 
new  value. 

For  node  j,  no  longer  conditioned  on  node  i,  the  new  variance  and  regression  coefficients  from 
predecessors  are: 


Vj  +bijVi 

(17) 

hj  +  bkibij 

(18) 

For  node  i,  conditioned  on  node  j,  the  new  variance  and  regression  coefficients  are: 


,  ViVj 

v'i  =  -f- 

(19) 

(20) 

(21) 

2.4  Example  of  the  Gaussian  Influence  Diagram 

The  following  example  does  not  use  actual  statistics  and  does  not  represent  any  actual  rela¬ 
tionships.  It  is  purely  fictional  to  demonstrate  the  use  of  the  influence  diagram. 

Suppose  that  an  analysis  of  statistics  from  a  group  of  professional  basketball  players  reveals 
a  relationship  between  the  height  of  the  player,  the  percentage  of  playing  time  in  a  game,  and  the 
number  of  points  per  game.  These  three  statistics  are  assumed  to  be  jointly  Gaussian  with  a  mean 
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vector  and  covariance  matrix; 

82 
20 
75 

9  2  4 

2  9  15 

4  15  49 

In  this  example,  the  subscript  h  refers  to  the  height  in  inches.  The  average  (mean)  height  is 
82  inches,  with  a  standard  deviation  of  3  inches.  The  subscript  p  is  the  number  of  points  scored  per 
game.  The  average  number  of  points  is  20,  with  a  standard  deviation  of  3  points.  The  subscript 
t  is  the  playing  time  expressed  as  a  percentage  of  total  game  time.  The  average  is  75%  with  a 
standard  deviation  of  7%. 

These  three  variables  can  be  expressed  in  an  influence  diagram  as  shown  in  Figure  10.  The 
node  labeled  H  refers  to  the  player’s  height,  the  node  labeled  P  is  the  average  number  points  per 
game,  and  the  node  labeled  T  is  the  playing  time  as  a  percentage.  The  operations  for  calculating 
the  influence  diagram  from  a  covariance  matrix  will  be  discussed  briefly  in  the  next  section.  The 
most  notable  feature  of  this  diagram  is  the  relationship  between  the  three  variables. 


Figure  10.  Example  of  Continuous  Gaussian  Influence  Diagram 


The  regression  coefficient  for  the  number  points  against  the  height  is  approximately  0.22222. 
This  means  that  each  increase  in  heignt  of  one  inch  changes  the  conditional  mean  by  0.22222  points 


m  = 


fit 


P  = 


Sh/i 

^hp 

Sa. 

SpA 

Epp 

Sp. 

S.A 

StP 
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per  game.  Similarly,  an  increase  of  one  inch  in  height  changes  the  conditional  mean  of  the  playing 
time  by  0.077922%.  An  increase  of  1  point  per  game  implies  an  increase  in  percentage  of  playing 
time  of  1.6494%.  If  both  the  height  and  the  number  of  points  are  given,  the  change  in  the  mean 
playing  time  is  the  sum  of  the  changes  caused  by  the  measured  height  and  number  of  points  scored. 
The  effect  can  be  thought  of  as  propagating  the  residual  to  all  .subsequent  nodes. 

The  conditional  mean  and  variance,  as  specified  in  this  diagram,  can  be  used  to  infer  the 
average  playing  time  if  a  player’s  height  is  given  or  if  the  a  player’s  points  per  game  are  given. 
The  total  effect  of  the  regression  coefficients  from  the  first  node  to  the  last  one  is  computed  by 
summing  the  effects  from  both  paths,  both  direct  and  indirect  through  the  second  node.  The  direct 
effect  is  the  regression  coefficient  on  the  path  from  the  first  to  the  third.  The  indirect  effect  is 
the  product  of  the  coefficients  on  the  path  from  the  first  to  the  second,  and  from  the  second  to 
the  third.  The  change  in  the  mean  of  the  last  node  due  to  the  realization  of  the  first  node  is 
(0.22222)(1.6494)  +  (0.077922)  =  0.44445. 

As  an  example,  assume  an  84-inch  player  scores  16  points  in  a  game.  The  conditional  mean  of 
the  number  of  points,  given  the  player’s  height  is  20-1-0.22222(84  -  82)  =  20.4444.  The  conditional 
mean  of  the  time  played,  given  both  the  height  and  the  number  of  points  is  calculated  by  first 
finding  the  effect  due  to  the  player’s  height,  then  finding  the  effect  due  to  the  number  of  points 
scored.  The  effect  due  to  the  height  measurement  is  75  -f  [0.44445(84  -  82)]  =  75.8889%.  At  this 
time,  the  height  node  is  no  longer  needed  on  the  influence  diagram.  The  effects  of  the  measurement 
have  been  propagated  to  all  subsequent  nodes.  In  the  influence  diagram,  the  realization  and  removal 
of  a  random  variable  is  called  “instantiation.”  Instantiated  nodes  must  be  at  the  beginning  of  the 
ordered  sequence  of  nodes  so  that  all  remaining  nodes  can  be  affected  by  the  realization  of  the 
random  variable. 

The  second  calculation  uses  the  new  conditional  mean  for  the  number  of  points  scored  to 
calculate  the  average  playing  time.  The  calculations  for  the  average  playing  time  now  becomes 
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75.8889+  [1.6494(16  —  20.44444)]  =  68.558%.  Again,  the  node  corresponding  to  the  reedized  ran¬ 
dom  variables  can  be  instantiated.  It  was  at  the  beginning  of  the  ordered  sequence  because  its 
predecessor  was  removed  by  instantiation.  After  the  second  node  is  instantiated,  only  the  last  node 
remains.  The  result  is  the  conditional  mean  estimate  of  the  playing  time  for  this  player  of  68.558% 
with  a  (conditional)  standard  deviation  of  4.8937%.  Even  though  this  node  has  no  predecessors, 
it  retains  the  conditional  mean  and  variance  based  on  instantiated  variables.  For  the  remainder  of 
this  thesis,  a  node  with  no  predecessors  will  be  said  to  be  in  “unconditional  form”  even  though  it 
may  represent  a  conditional  distribution. 

The  influence  diagram  can  also  be  rearranged  as  described  in  the  previous  section.  In  this 
case,  it  will  be  rearranged  so  that  the  player’s  height  is  last.  This  takes  place  in  two  separate 
node  reversal  operations.  First,  the  nodes  labeled  H  and  P  are  reversed  to  yield  the  first  influence 
diagram  in  Figure  11.  The  calculation  involved  in  calculating  the  new  conditional  and  unconditional 
variances  are: 
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Second,  the  nodes  labeled  H  and  T  are  reversed  to  yield  the  influence  diagram  in  Figure  12. 
Both  nodes  have  node  P  as  a  predecessor  so  the  calculations  are: 

23.948 +  (1.6494)*(8.5556) 

47.224 

0.077922 -I- (0.22222)(1.6494) 

0.44445 

(8.5556) (23.948) 

47.224 

4.3387 

(8.5556) (1.6494) 

47.224 

0.29882 

0.22222  -  (0.44445)(0.29882) 

0.089409 

Again,  the  conditional  mean  and  variance  can  be  calculated,  based  on  realizations  of  either  or 
both  of  the  first  two  random  variables  in  the  diagram.  In  another  example,  assume  a  player  scores 
24  points  in  a  game  and  plays  95%  of  the  game.  The  conditional  mean  of  the  playing  time,  given 
the  number  of  points  scored  is  75  +  0.44445(24  —  20)  =  76.7778%.  The  conditional  mean  of  the 


0.077922 


(20,9)  (82,8.5556)  (75,23.948) 

Figure  11.  First  Reversal  of  Gaussian  Influence  Diagram  Example 
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Figure  12.  Second  Reversal  of  Gaussian  Influence  Diagram  Example 


player’s  height,  given  both  the  number  of  points  scored  and  the  actual  playing  time  is  calculated 
in  two  steps,  First,  the  total  change  due  to  the  change  in  number  of  points  is  computed.  The  total 
coefficient  from  the  first  to  the  last  node  is  0.089409  +  (0.44445)(0.29882)  =  0.22222.  The  new 
average  height,  due  to  the  realization  of  the  number  of  points  is  82  +  0.22222(24  -  20)  =  82.8889. 
The  first  node  is  then  instantiated,  leaving  only  the  last  two  nodes.  Next,  the  change  due  to  the 
playing  time  is  computed.  This  is  82.8889+  (0.29882)(95  -  76.7778)  =  88.334.  The  second  node  is 
now  instantiated,  leaving  only  the  last  node.  The  conditional  mean  estimate  of  the  player’s  height 
is  88.334  inches  with  a  (conditional)  standard  deviation  of  3.0830  inches. 

2.5  Matrix  Represeniaiion  of  iht  Influence  Diagram 

The  influence  diagram  is  a  convenient  tool  for  calculating  the  conditional  mean  and  variance 
of  an  element  of  a  Gaussian  random  vector.  It  is  a  visual  depiction  of  the  equations  for  the  Gaussian 
conditional  mean  vector  and  covariance  matrix: 

Px|j/  =  Pxi  ~  Ply  Pyy  Pyx  (22) 

m,|y  -  m*  +  PxyPyy  (/» “  ^y )  (23) 

where  p  is  a  dummy  vector  for  the  realization  o'”  components  of  the  y,  P^^jy  is  the  conditional 
variance  of  x  given  y,  and  mx|y  is  the  conditional  mean  of  x  given  y  [7:pp.  110-111]. 
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In  the  first  of  these  two  equations,  the  conditional  variance  of  a  node  in  the  influence  diagram  is 
fixed  for  a  given  set  of  conditioning  variables.  The  influence  diagram  allows  convenient  recalculation 
of  the  conditional  variance  when  the  conditioning  variables  change. 

In  the  second  of  these  two  equations,  the  conditional  mean  is  seen  to  be  a  linear  function  of 
the  residuals  of  the  conditioning  variables.  This  linear  relationship  is  maintained  in  the  influence 
diagram  by  the  regression  coefficients. 

The  covariance  matrix  can  be  used  to  calculate  both  the  matrix  of  regression  coefficients  and 
the  conditional  variances  for  all  random  variables  in  the  vector.  The  calculations  are  closely  related 
to  the  Cholesky  decomposition  of  the  covariance  matrix.  For  example,  let  P  be  a  positive  definite, 
symmetric  covariance  matrix.  A  factorization  of  P  exists  in  the  form  P  =  U^SSU  where  U  is  a 
unit  upper  triangular  matrix  and  S  is  a  diagonal  matrix  so  that  =  S.  The  matrix  (SU)^SU 
is  identical  to  the  Cholesky  decomposition  of  the  covariance  matrix.  The  matrix  SS  =  D  is  also  a 
diagonal  matrix  so  the  the  covariance  matrix  can  be  represented  as  U^DU. 

Let  I  be  the  identity  matrix,  and  Bj  be  a  strictly  upper  triangular  matrix  that  is  all  zeros 
except  for  the  jth  column  above  the  diagonal.  The  elements  of  the  B;-  matrix  are  the  regression 
coefficients  6ij,  62;, . . . ,  where  the  subscripts  indicate  the  predecessor  and  the  successor  node. 
The  subscripts  also  correspond  to  the  conventional  row-column  notation  for  matrix  elements.  The 
matrix  Uj  is  a  unit  upper  triangular  matrix  defined  as  (I  -  B;),  and  t'.e  matrix  B  is  defined  as 
Bi  -f  Bo  -i-  B3  -b . . .  +  Bn  (where  the  convention  leads  to  Bi  =  0  and  Ui  =  I).  Kenley  and  Schacter 
showed  [12:pp.  547-548]: 

U  =  U,U2U3...U„  (24) 

and 

U  =  (I-B)-i  (25) 

The  D  and  the  U  matrices  can  be  computed  by  taking  the  Cholesky  decomposition  of  the  covariance 
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matrix,  then  factoring  the  Ui ,  U2,  U3, . . . ,  U„  matrices  from  the  U  matrix.  A  more  efficient  method 
is  given  in  Kenley  and  Schacter  [12:549]. 

For  the  example  in  the  previous  section,  the  matrices  turn  out  to  be: 
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1  0  0.077922 
0  1  1.6494 
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1  0.22222  0.44445 
0  1  1.6494 

0  0  1 


D  = 


9  0  0 

0  8.5556  0 

0  0  23.948 


From  this  example,  each  term  of  the  diagonal  matrix  D  is  seen  to  be  the  conditional  variance 
of  the  random  variable,  conditioned  on  those  variables  on  the  diagonal  above  it.  The  terms  above 
the  diagonal  in  the  and  matrices  are  just  the  regression  coefficients.  The  bij  coefficient 
indicates  the  direct  linear  change  in  the  jth  conditional  mean  due  to  a  change  in  the  mean  of  the 
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ith  variable.  The  terms  of  the  U  matrix  represent  the  total  effect  of  all  regression  coefficients, 
both  direct  and  indirect.  In  this  example,  the  U13  term  is  0.44445.  This  is  the  value  previously 
calculated  as  the  effect  of  the  measured  height  on  the  conditional  mean  of  playing  time. 

2.6  The  Influence  Diagram  for  Discrete-Time  Filtering 

Kenley  proved  in  his  doctoral  dissertation  that  the  influence  diagram  could  be  used  for 
discrete-time  filtering.  The  first  of  the  following  subsections  will  be  a  discussion  of  discrete-time 
Kalman  filtering  taken  from  Maybeck  [7;pp.  133-220]  and  discrete-time  filtering  taken  from  Ken¬ 
ley  [6:pp.  52-106).  It  is  intended  to  present  Kenley’s  theories  in  parallel  with  more  widely  known 
Kalman  filtering  algorithms.  The  second  of  the  following  subsections  will  be  a  numerical  example 
showing  the  operation  of  the  influence  diagram. 

The  Kalman  filter  is  a  conditional  mean  estimator  of  the  states  of  a  linear  system,  under  the 
assumption  of  linear  measurements  and  Gaussian  disturbances.  Under  these  assumptions,  it  is  an 
optimal  estimator. 

The  Gaussian  influence  diagram  permit  the  calculation  of  the  conditional  density  of  Gaussian 
random  variables  in  a  jointly  Gaussian  random  vector.  If  these  variables  represent  the  states  of  a 
linear  system,  then  it  too  is  a  conditional  mean  estimator  for  the  states.  Under  these  assumptions, 
it  will  yield  the  same  state  estimates  and  variances  as  the  Kalman  filter. 

Assume  a  linear  system  of  the  form: 


x{t,)  =  #(t,,fi_i)x(f,_i)-bGd(t,_i)wd(i,-i)  (26) 

where  x(f,)  and  x(f,_i)  are  n-dimensional  state  vectors  at  times  fj  andij_i  respectively. 

is  the  state  transition  matrix  which  describes  the  propagation  of  the  states  from  time  <,_i  to  time 

U.  The  term  vfd{ii-i)  represents  the  process  noise  and  is  assumed  to  be  a  discrete-time  zero-mean 
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white  Gaussian  noise  sequence  with  covariance  kernel 


Qd(ti))  ti  —  tj, 
0,  ti  #  tj 


(27) 


The  matrix  Gd(ti-i)  represents  a  linear  operation  which  describes  how  the  discrete-time  noise 
enters  the  system  at  time  i,_i.  The  vectors  x(t,_i)  and  Wd(t,_i)  are  assumed  to  be  independent. 

Now  assume  that  the  state  vector  at  time  ti_i  is  a  Gaussian  random  vector.  The  conditional 
mean  of  this  vector,  based  on  a  priori  information  and  measurements  up  to  time  tj_i  is  defined  to 
be: 

X(tt  J  =  (28) 


where: 

z(ti) 

z{t2) 


Z(t._i)  = 


(29) 


z{U-i) 


In  this  equation,  the  measurement  vector  available  at  time  tj  is  called  z(t;),  and  Z(ti_i)  is  the 
history  of  measurements  through  time  U-i-  At  to,  the  state  estimate  is  defined  to  be: 


x(to)  =  E{ic{to)} 


(30) 


The  conditional  covariance  of  the  state  estimate  at  time  /,_i  is  defined  by: 

P(t.ti)  =  ^{[x(/.-_0  -x(t.ti)]  [x(<.--i)  -i(Ci)r  |Z(<.-i)}  (31) 
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and  the  unconditional  covariance  of  the  estimate  at  time  to  is 


P(to)  =  i5{[x(to)  -  x(to))  [x(to)  -  X(to)f }  (32) 

Because  the  state  estimate  at  time  ti_i  is  a  Gaussian  random  vector,  its  density  can  be  fully 
described  by  the  conditional  mean  and  the  covariance  matrix  P(t^j). 

The  state  estimate  at  time  <,_i  can  also  be  described  in  influence  diagram  form.  The  n- 
dimensional  random  vector  is  depicted  as  n  random  variable  nodes.  The  means  of  the  nodes 
are  the  respective  conditional  means  of  the  vector  x(tf_y).  The  regression  coefficients  and  the 
conditional  variances  of  the  individual  nodes  are  equivalent  to  the  covariance  matrix  P(tjLi)-  Such 
an  n-dimensional  influence  diagram  is  shown  in  Figure  13.  In  this  influence  diagram,  the  first  node 
is  in  unconditional  form,  but  it  represents  the  conditional  mean  and  variance,  conditioned  on  prior 
measurements.  The  prior  measurements  do  not  need  to  be  shown  because  they  represent  random 
variables  that  have  been  realized  and  instantiated.  The  remaining  nodes  represent  the  conditional 
mean  and  variance,  based  on  the  instantiated  nodes.  The  realizations  of  the  instantiated  nodes 
do  not  affect  the  condition."!  variances,  but  they  do  affect  the  conditional  means.  In  this  way,  the 
conditional  mean  vector  x(<,'*l,i)  becomes  a  sufficient  statistic  for  the  measurement  history  Z(fi_i). 


Figure  13.  Influence  Diagram  for  N  Jointly  Gaussian  Random  Variables 


The  linear  model  in  Equation  (26)  describes  the  propagation  of  the  states  from  time  U-:  to 
time  t,.  The  estim-ate  at  time  t,  is  also  Gaussian  with  a  conditional  density  function  of  the  states 
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at  time  given  the  measurements  through  time  defined  by  the  conditional  mean 

x(<r)  =  ^?{x(t.)|Z(t.-i)}  (33) 

and  the  conditional  covariance  matrix 

P(^“)  =  -  x(<r)]  Nti)  -  x(<r)]^  (34) 

The  conditional  mean  and  covariance  at  time  U  can  be  calculated  by  the  Kalman  filter  propagation 
equations: 

x(t,-)  =  #(ti,t,_i)x(t,t ,)  (35) 

P(t-)  =  $(<.,t._i)P(t.t,)$^(tMt.-i)  +  Gd(ti.i)Qd(t.-.i)Gj(t,_,)  (36) 

The  influence  diagram  can  depict  the  linear  model  of  Equation  (26).  As  shown  earlier,  the 
conditional  density  of  the  state  estimate  at  time  tj_i  corresponds  to  an  influence  diagram  with  n 
nodes.  The  conditional  means  of  the  nodes  are  x(<^j)  while  the  conditional  covariance  P(fj'l.i) 
is  factored  into  the  conditional  variances  of  the  respective  nodes  and  the  regression  coefficients 
between  them. 

The  r-dimensional,  discrete-time,  zero  mean  random  noise  vector  Wd(t<_i)  is  depicted  as  r 
nodes  in  influence  diagram  form.  In  Figure  14,  the  nodes  of  w<j(t,_i)  have  no  arrows  drawn  between 
them,  implying  that  the  covariance  matrix  Qd{U-i)  is  diagonal.  The  mean  of  all  nodes  of 
is  zero. 

The  vector  x(t,)  is  depicted  by  a  set  of  n  independent  deterministic  nodes.  This  is  because 
it  is  a  deterministic  function  of  two  independent  Gaussian  random  vectors,  x(<,_i)  and  vfdiU-i)- 
The  linear  relationship  between  x(t,)  and  x(t,_i)  is  represented  by  the  regression  coefficients  on  the 
arrows  from  the  nodes  of  x(i,_i)  to  the  nodes  of  x(<,).  From  Equation  (26),  this  linear  relationship 


34 


is  the  matrix  Consequently,  the  regression  coefficients  on  the  arrows  between  the  two 

vectors  are  the  elements  of  the  state  transition  matrix.  The  coefficient  on  the  arrow  between  the 
kth  node  of  x(<,_i)  and  the  jth  node  of  x(t,-)  corresponds  to  the  (j,k)  element  of 

In  this  example,  Q<i(<,_i)  is  assumed  to  be  diagonal  and  the  linear  operation  corresponding 
to  Gd(t,_i)  is  depicted  by  arrows  from  the  nodes  ofwd(t,_i)  to  x(t,).  The  coefficient  on  the  arrow 
from  the  kth  node  of  Wd(t<-i)  to  the  jth  node  of  x(t,)  is  the  (j,k)  element  of  Gd(t,_i).  The  influence 
diagram  in  Figure  14  shows  the  vectors  and  matrices  as  just  described. 


As  will  be  shown  in  Chapter  3,  the  matrix  product  Gd(t,_i)wd(<,_i)  could  have  been  factored 
such  that  Gd(t,_i)  is  the  identity  matrix  and  Qditi-i)  is  an  n-dimensional  matrix  which  is  not 
diagonal.  In  such  a  case,  will  be  depicted  as  n  nodes  with  arcs  between  them  corresponding 

to  the  influence  diagram  factorization  of  the  non-diagonal  matrix  Qd(t,_i).  The  identity  matrix 
Gd(<,_i)  will  be  depicted  as  n  arrows.  Each  node  of  Wd(<,_i)  will  have  only  one  arrow  leading 
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from  it  to  a  single  corresponding  node  of  Xd{U)-  The  regression  coefficients  on  these  arrov-’s  will  be 
unity. 

The  entire  influence  diagram  in  Figure  14  can  be  thought  of  as  2n+r  jointly  Gaussian  random 
variables  making  up  a  Gaussian  random  vector.  It  represents  the  joint  distribution  of  x(t,), 

and  Wd(t,_i).  Because  is  assumed  to  be  independent  of  previous  states  and  measurements, 

the  density  function  )|x(j,  ,)Z(t.  ,)  simplifies  to  /wj(<,)-  The  joint  distribution  of  x(ti_i), 
x(fi),  and  can  then  be  written  in  conditional  form  as: 


/x(4.),X(t,_i),W4«._,)|Z((,_,)  “  4(<._,)|Z(t._,)  /Wd(4.-i)  /x(«,)|X(4._,).'Wd(4,-i),Z(t._,) 


The  desired  density  function  is  ,)•  The  nodes  which  represent  this  density  function 

are  those  in  the  center  column  in  Figure  14,  labeled  x{ti).  Using  the  the  influence  diagram  to 
calculate  this  density  function,  the  objective  is  to  reverse  the  arcs  in  the  diagram  until  the  desired 
nodes  are  in  unconditional  form,  i.e.,  at  the  beginning  of  the  ordered  sequence  of  nodes.  This 
operation  is  equivalent  to  calculating 


,  _  4(t.-i)|Z((._,)  /wd(<.-i)  /x(t.)|x(<.-,),Wd(t.-,),Z(t,-i) 

■'x((.i|Z(«._,)  ~  f  „ 

•'x(«._,),Wi(i._,)|x(«.),Z(t._,) 


(38) 


Equation  (38)  can  be  verified  by  multiplying  both  sides  by  the  denominator  of  the  right  side.  The 
result  is  Equation  (37). 

The  nodes  of  w<j(f,_i)  and  x(<,_i)  will  be  moved  to  the  end  of  the  ordered  sequence  of  nodes 
where  they  can  be  removed  as  “nuisance”  variables.  An  efficient  way  of  removing  these  nodes  is  to 
first  make  the  nodes  of  Wd(f,_i)  conditioned  upon  the  nodes  of  x{ii).  This  process  starts  with  the 
most  conditional  node  of  Wd(/,_i)  and  reverses  arrows  until  it  is  conditioned  on  the  entire  vector 
x{ti)  as  well.  When  this  node  is  at  the  end  of  the  ordered  sequence  (at  the  end  of  x(t,))  it  can  be 
removed.  The  process  continues  by  reversing  and  removing  the  most  conditional  of  the  remaining 
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nodes  of  y/diU-i),  one  at  a  time.  This  results  in  a  diagram  with  only  x(ti_i)  and  x(i,)  as  the  first 
diagram  in  Figure  15. 

Now  the  nodes  of  x(t,_i)  can  be  conditioned  upon  the  nodes  of  x(<j)  in  the  same  manner, 
starting  with  the  most  conditional  node.  Alternatively,  the  least  conditional  (top)  node  of  x(t,)  can 
be  made  less  conditional  by  reversing  arrows  with  the  nodes  in  x(t,_i)  until  it  is  in  unconditional 
form.  Then  the  second  node  of  x(<,)  can  be  moved  until  it  is  conditioned  only  on  the  first.  The 
process  repeats  until  the  last  node  of  x(t,)  is  moved  up.  As  each  node  of  x(<,_i)  successively 
becomes  a  “nuisance”  variable,  it  is  removed.  Both  methods  are  equally  efficient  and  result  in  the 
second  diagram  of  Figure  15. 

This  remaining  vector  of  nodes  represents  the  desired  density  function 
conditional  mean  of  this  vector  is  x(i")  while  the  regression  coefficients  and  conditional  variances 
of  the  nodes  represent  a  factored  form  of  P(t,~ ).  The  infiuence  diagram  algorithm  renders  the  same 
conditional  mean  estimate  and  conditional  covariance  matrix  as  the  Kalman  filter  propagation 
Equations  (35)  and  (36). 

The  second  part  of  the  Kalman  filter  assumes  a  linear  measurement  model  with  additive 
discrete-time  Gaussian  disturbances.  The  model  is  represented  in  the  form: 

z{U)  =  H(t,)x(t,)  +  v{U)  (39) 

or,  as  will  be  used  later: 

z{ii)  =  H(t,)x(tf)  +  Iv(t,)  (40) 

where  I  is  the  identity  matrix  of  appropriate  dimension.  The  measurement  vector,  z(l,)  is  a  linear 
combination  of  the  states  plus  a  discrete-time  Gaussian  noise  vector.  The  term  H(t,)  is  the  matrix 
which  describes  the  linear  combination  of  states.  The  term  v(t,)  is  the  zero-mean,  discrete-time. 
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Influence  diagram  after  Wrf(ti_i)  removed 


x(f.) 


Influence  diagram  after  x(ti_i)  removed 
Figure  15.  Results  of  Removing  the  Vectors  and  x(t,_i) 
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Gaussian  noise  with  covariance  kernel: 


R(ti),  = 

0,  u  tj 


(41) 


The  new  conditional  density  function  of  the  states,  given  all  prior  measurements  through 
time  ti  and  a  priori  knowledge,  is  a  Gaussian  random  vector.  Because  it  is  Gaussian,  only  the 
conditional  mean  and  covariance  matrix  are  needed  to  define  the  density  function.  The  conditional 
mean  of  the  state  vector,  conditioned  on  the  measurements  through  time  tj,  is  x{tf)  where: 


x{tf)  =  i7{x(t,)lZ(t,)} 


(42) 


and  the  new  covariance  matrix  of  the  state  estimate  at  time  t,-,  conditioned  on  measurements 
through  time  ti,  is  defined  by: 

P(tf )  =  E{[x{U)  - xitf)]  [x(t,)  -  x(tt)]^  |Z(t.)}  (43) 


The  Kalman  filter  calculates  the  updated  conditional  density  function  by  first  calculating  a 
Kalman  gain  matrix  K(<,),  then  using  it  to  calculate  both  the  conditional  mean  and  covariance. 
The  conditional  mean  is  the  estimate  of  the  state  vector.  The  applicable  equations  are: 


K{ti)  =  P(<r)H"(t.)  [H(t,)P(tr )H^(tO  +  R(<,)] (44) 


■t)  =  x(<,-  )  +  K{ti)  [z,-  -  H(t<)x(tj  )] 

(45) 

P(tt)  =  P(tr)  _  K(t,)H(ti)P(<r) 

(46) 
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Figure  16  shows  how  the  influence  diagram  can  depict  the  linear  measurement  model.  The 
p-dimensional  measurement  vector  z(i,)  is  a  deterministic  function  of  the  n-dimensional  random 
vector  x{ti)  and  the  p-dimensional  random  vector  v(t,).  In  this  influence  diagram,  assume  that  the 
n-vector  of  nodes  x{ti)  is  the  same  as  the  n-vector  of  nodes  in  Figure  13.  As  such,  it  represents  the 
conditional  density  given  in  Equation  (38).  The  vector  v(ii)  is  a  zero  mean,  discrete-time,  Gaussian 
noise.  The  covariance  matrix  R(ti)  is  represented  in  factored  form  by  the  variances  of  the  nodes  of 
v{ii).  The  lack  of  arcs  between  the  nodes  of  v(tj)  implies  that  R(ti)  is  a  diagonal  matrix. 

The  measurement  matrix  H(t,)  is  represented  by  the  arrows  from  the  nodes  of  x(t,)  to  z(t,). 
The  regression  coeflScients  on  these  arrows  are  the  elements  of  They  represent  the  linear 

combination  of  states  that  makes  up  the  measurement  vector.  The  identity  matrix  in  Equation  (40) 
is  represented  by  the  arrows  from  the  nodes  ofv(t,)  to  the  nodes  of  z(t,).  The  regression  coefficients 
on  these  arrows  from  are  all  equal  to  one,  corresponding  to  the  ones  in  the  identity  matrix. 

The  desired  density  function  is  the  vector  x(t,)  conditioned  on  the  vector  z(t,)  (as  well  as  on 
Z(<i_i)).  In  the  influence  diagram,  this  means  that  the  nodes  of  z(<<)  will  be  first  in  the  ordered 
sequence,  followed  by  the  nodes  of  x(<i).  The  nodes  of  v(<,)  are  not  needed  and  can  be  removed  as 
nuisance  variables.  The  required  operations  can  take  place  in  two  steps. 

The  first  step  is  to  remove  the  nodes  ofv(t,).  The  arrows  between  v(ti)  and  z{li)  are  reversed 
until  each  of  the  nodes  of  v(t,)  are  conditioned  on  z(ti)  and  removed  as  nuisance  variables.  If  R(ti) 
is  diagonal  as  shown  in  Figure  16,  then  each  deterministic  node  of  z(t,)  takes  on  the  variance  of  the 
associated  node  of  y{ti).  This  operation  is  shown  in  the  first  influence  diagram  in  in  Figure  17. 

If  R(t,  )  is  not  diagonal,  then  there  are  two  options.  One  option  is  a  transformation  of  variables 
from  z{ti)  to  z*{ti),  yielding  a  new  H*(<i)  and  R*(ti)  such  that  R*(<i)  is  diagonal  [7:pp.  375-377]. 
The  other  option  is  to  factor  R(<t)  into  influence  diagram  form  and  remove  the  nodes  one  at  a  time. 
With  the  second  option,  the  nodes  of  z{ti)  will  have  arrows  between  them  after  v(f,)  is  removed, 
just  as  the  nodes  of  x(t,)  had  arrows  between  them  in  the  top  influence  diagram  in  Figure  15. 
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Figure  16.  Measurement  Update  Model  for  z(t,)  =  H(t,)x(t,)  +  v(t,) 
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Thfc  second  step  in  the  process  is  to  reverse  the  arrows  between  the  new  z{U)  and  x{U).  The 
distribution  of  the  vector  x(t,)  is  now  conditioned  on  the  vector  These  operations  are  shown 
in  the  second  influence  diagram  in  Figure  17. 

The  linear  operation  depicted  by  the  arrows  from  z(t,)  to  x(t,)  in  Figure  17  represent  the 
Kalman  gain  matrix.  However,  the  influence  diagram  in  Figure  17  does  not  use  the  notation  K(<,). 
This  is  because  the  regression  coefficients  on  the  arrows  from  z(<i)  to  x(<,)  are  not  identical  to  the 
elements  of  the  Kalman  gain  matrix.  On  the  other  hand,  the  total  effect  of  all  regression  coefficients 
from  node  i  of  z(ti)  to  node  j  of  x(f,)  must  be  the  (j,i)  element  in  the  Kalman  gain  matrix.  This 
total  effect  includes  the  direct  effects  of  the  regression  coefficients  between  the  two  vectors,  as  well 
as  the  effects  between  components  within  each  vector.  This  relationship  will  be  demonstrated  in 
the  numeric  example  of  the  following  subsection. 

The  final  calculation  is  to  instantiate  the  nodes  of  z(t,  )  by  realizing  the  random  variables  in 
order,  and  removing  the  nodes.  With  each  realization,  the  difference  between  the  mean  and  the 
realized  value  of  the  random  variable  (the  residual)  is  propagated,  via  the  regression  coefficients, 
to  all  subsequent  nodes.  The  remaining  vector  x(ti)  is  now  conditioned  on  the  measurements  z{U) 
and  Z(<,_i)|  or  simply  Z(t,-) 

Similar  to  the  prior  example,  the  entire  influence  diagram  in  Figure  16  can  be  thought  of  as 
2p+  n  jointly  Gaussian  random  variables  making  up  a  Gaussian  random  vector.  It  represents  the 
joint  distribution  of  x(t,),  z(t,),  and  v(tj).  Since  v(f,)  is  independent  of  previous  measurements 
and  state  estimates,  /v(,,)|x(<,)  Z(<.  ,)  ~  The  joint  distribution  of  x(t,),  z{U),  and  v(f,)  can 
then  be  written  in  conditional  form  as: 

/x(<,),z(«,),v(«.)|Z(<,_,)  =/v((.)  /x(«.)|Z(»._,)  /z(«.)|x(«,).v(t,),Z((._,) 
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The  desired  density  function  is /x(,^)jZ(i,  i)z(i,)  ~  fx(t,)\Z{t,)-  density  function  is  repre¬ 
sented  by  the  arrangement  of  nodes  in  the  second  influence  diagram  of  Figure  17.  In  this  diagram, 
the  nodes  of  x(tj)  are  conditioned  upon  the  nodes  of  z(t,).  Because  the  nodes  of  x(t,)  were  already 
conditioned  on  the  resultant  density  function  is  conditioned  on  all  measurements  through 

time)!,-,  represented  as  Z(t,). 

The  objective  of  the  influence  diagram  manipulations  is  to  reverse  arcs  as  necessary  to  achieve 
the  desired  form  of  the  influence  diagram.  This  operation  is  done  in  two  steps.  First,  the  top 
diagram  in  Figure  17  is  the  joint  density  of  x(t,)  and  z(t,)  expressed  in  influence  diagram  form  as 
the  marginal  density  of  one  vector  and  the  condith  .  1  density  of  the  second  vector,  given  the  first. 
The  influence  diagram  operations  represent  the  equations: 


/x(<.),z(«.)|Z(<._i) 


/x(i.),z(t.),v(t.)|Z(t,_i) 

/v(<.)  /x(t.)|Z(t..,)  /z(t.)|x(i.),v(t.)Z(t.-,) 
/v(i,)|z(j,;,x(i.) 


(48) 

(49) 


This  equation  can  be  verified  by  multiplying  both  sides  by  the  denominator. 

The  top  diaj-  n  of  Figure  17  is  further  changed  to  the  bottom  diagram  by  rearranging  the 
joint  density  of  x(t,)  and  z{ii)  into  a  conditional  form  with  x(t,)  conditioned  on  z(i,).  The  influence 
diagram  operations  represent  the  equations: 


/x((,)iZ((,_,)z(<.) 


/x«.).z(«.)|Z(t.-i) 

4(i.)iZ(«._,) 

/x(t.)|Z(t,-,)  4(t,)|x(i.)Z(t.-i) 

4(«,)iZ(i,_,) 


(50) 

(51) 


At  this  time,  the  regression  coefficients  and  conditional  variances  of  x(<,)  represent  the  con¬ 
ditional  covariance  matrix  of However,  the  influence  diagram  represents  the  conditional 
mean  in  equation  form  as  a  function  of  the  realization  of  the  components  of  To  complete  the 
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operation,  the  components  of  z{ti)  are  realized  one  at  a  time,  in  order,  and  the  appropriate  nodes 
are  instantiated.  When  all  nodes  of  z(i,)  are  removed,  then  the  remaining  nodes  of  x(t,)  represent 
the  conditional  density  )|Z(i,)  numerical  form. 

Because  the  remaining  density  is  Gaussian,  it  can  be  represented  as  the  conditional  mean 
vector  and  the  conditional  covariance  matrix.  Kenley  showed  that  the  conditional  mean  vector  of 
x(<i)  calculated  by  the  influence  diagram  is  identical  to  the  Kalman  filter  estimate  of  the  states, 
x{tf).  Furthermore,  the  conditional  covariance  matrix  represented  by  the  influence  diagram  is 
equivalent  to 

The  influence  diagram  representation  of  the  discrete-time  system  models  of  Equations  (26) 
and  (40)  over  one  time  interval  can  be  depicted  as  one  large  influence  diagram.  Such  an  influc’ce 
diagram  is  shown  in  Figure  18. 

Normally,  a  single  time  interval  is  one  of  many,  perhaps  infinite,  intervals.  The  influence 
diagram  can  depict  multiple  time  intervals,  but  it  is  messy.  One  simplification  assumes  each  node 
represents  an  entire  random  vector.  This  simplification  allows  depiction  of  a  series  of  time  intervals 
in  a  more  orderly  manner.  Figure  19  depicts  the  first  three  time  intervals  of  a  linear  system  as 
given  in  Equations  (26)  and  (40)  in  vector  influence  diagram  form.  Each  node  depicts  an  influence 
diagram  for  an  entire  random  vector.  Arrows  bet\/een  nodes  represents  a  matrix  linear  operation 
between  two  vectors. 

One  practical  problem  with  implementing  the  influence  diagram  over  many  intervals  is  that 
when  a  measurement  is  made,  the  residual  will  be  propagated  to  all  successor  nodes.  In  reality, 
there  may  be  many  successors.  Using  the  influence  diagram  to  propagate  the  new  conditional 
means  to  all  successors  may  not  be  practical.  Kenley’s  original  work  did  not  address  the  infinite 
successor  problem,  but  there  is  a  simple  solution,  as  will  be  shown  later. 

2.6.1  Numerical  Example  of  ike  Discrete-Time  Kalman  Ftlier.  Assume  a  model  of  a  one¬ 
dimensional  target  tracking  problem.  In  this  problem,  the  target’s  position  is  the  integral  of  the 
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Wd(t.-l) 


Figure  18.  Influence  Diagram  Representation  of  Discrete-Time  System  Model 
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Figure  19.  Vector  Form  of  Discrete-Time  System  Model  Over  Several  Time  Intervals 
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velocity,  and  the  velocity  is  the  integral  of  the  acceleration.  The  acceleration  is  modeled  as  a 
continuous-time  Gaussian  white  noise  w{t)  through  a  first  order  lag  filter.  The  equations  of  motion 
in  state  variable  form  are: 


ii(t) 

=  X2(t) 

(52) 

i2{t) 

=  SC3{t) 

(53) 

isit) 

=  -^a;3(t)  +  tu(0 

(54) 

where  T  is  the  time  constant  of  the  first  order  lag  filter.  In  matrix  form,  these  equations  are: 

x(t)  =  F(t)x(t)  +  G(i)w(i)  (55) 


xi(i) 

0 

1 

0 

®i(0 

0 

X2ii) 

= 

0 

0 

1 

12(0 

+ 

0 

isit) 

0 

0 

1 

-y 

X3(0 

1 

The  white,  Gaussian  noise  model  is  zero  mean  with  an  autocorrelation  kernel  of: 


(56) 


E  [iu(<)u;(<  +  r)]  =  Q6{t) 


(57) 


In  this  example,  let  Q  —  2/T.  This  yields  an  autocorrelation  kernel  for  X3(t)  of: 


E[3:3{t)x3{t  +  r)j  = 


(58) 
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The  state  transition  matrix  can  be  computed,  for  a  constant  time  interval  of  r,  as: 


^t  +  T,t)  =  L-H[sl-F]-^}  = 


1  T  T2(r/r-l  +  e-^/^) 
0  1  r(i  -  e-’’/’’) 

0  0 


(59) 


where  i“^{}  indicates  the  inverse  Laplace  transform.  Assume  T  =  2  and  r  =  1  so  that  the  sample 
time  is  less  than  the  time  constant  of  the  first  order  lag  filter.  The  simplified  the  state  transition 
matrix,  to  four  significant  digits,  is: 


$(<  +  !,<)  = 


1  1  0.4261 
0  1  0.7870 
0  0  0.6065 


(60) 


The  discrete-time  equivalent  noise  Wi  is  zero  mean,  and  has  covariance  kernel 


E{w<i(<,')wJ(t;-)} 


QdiU)  =  /o  *(l,s)GQG^$^(l,s)(is  U  =  tj, 

0  tj 


(61) 


where  G  and  Q  are  time  invariant.  The  result  of  the  integration  is  a  time-invariant,  discrete-time 
equivalent  covariance  matrix  Q(j  calculated  to  four  significant  digits  as: 


3.063 

-2.336 

-0.5677 

-2.336 

1.904 

0.4160 

-0.5677 

0.4160 

0.1080 

(62) 


The  matrix  G^  is  assumed  to  be  the  identity  matrix.  Alternatively,  the  matrix  product  G^Q^GJ 
could  take  on  the  value  assigned  to  Qj  and  an  arbitrary  G^  could  be  factored  from  the  product  (for 
instance,  let  Ga  be  the  U-factor  and  Q<i  be  the  D-factor  of  GdQjGj,  as  in  the  U-D  filter  [7:396]). 
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The  initial  position,  velocity,  and  acceleration  are  known,  with  an  initial  covariance  matrix 


as: 


x(to) 


P(to) 


1 

1 

1 

1  0  0 
0  1  0 
0  0  1 


Measurement  are  taken  at  the  end  of  the  time  interval  r  according  to  the  model: 


(63) 


(64) 


Z{ti) 


z{ti) 


H{ti)x{U)  +  u(t,) 


1  0  0 


x{U)  +  t)(t,) 


(65) 

(66) 


where  v(<,-)  is  a  discrete-time,  zero  mean,  Gaussian  noise  with  covariance  kernel: 


£{t;(t,)v(t^)}  =  { 


1  ii  —  ij  , 
0 


(67) 


The  Kalman  filter  equations  for  the  state  estimate  and  covariance  matrix  at  time  ti,  before 
the  measurements  are  made,  are  calculated  by  Equations  (35)  and  (36).  These  equations  yield: 


1  1  0.4261 

1 

x(ti)  = 

0  1  0.7870 

1 

0  0  0.6065 

1 

(68) 
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2.426 


x(<i) 


1.787 

0.6065 


(69) 


p(<r) 


p(<r) 


1 

1 

0.4261 

1 

0 

0 

1 

1 

0.4261 

0 

1 

0.7870 

0 

1 

0 

0 

1 

0.7870 

0 

0 

0.6065 

0 

0 

1 

0 

0 

0.6065 

3.063 

-2.336 

-0.5677 

-2.336 

1.904 

0.4160 

-0.5677 

0.4160 

0.1080 

5.244 

-1.001 

-0.3092 

-1.001 

3.523 

0.8933 

-0.3092 

0.8933 

0.4760 

(70) 


(71) 


The  Kalman  filter  equations  given  in  Equations  (44)  through  (46)  are  the  state  estimate  and 
covariance  matrix  at  time  t\,  after  the  measurements.  Assume  the  measurement  of  the  position, 
according  to  Equation  (39),  results  in  an  observation  of  2.000.  Then  the  Kalman  filter  equations 
result  in: 


HP(tj-)H^  +  R 


1  0  0 


=  6.244 


5.244 

-1.001 

-0.3092 

-1.001 

3.523 

0.8933 

-0.3092 

0.8933 

0.4760 

+  [1] 


(72) 

(73) 
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K(<0 


K{h) 


x{tt)  = 


mt)  = 


put)  = 


Pitf)  = 


5.244 

-1.001 

-0.3092 

-1.001 

3.523 

0.8933 

-0.3092 

0.8933 

0.4760 

1  0  0 


(1/6.244) 


0.8399 

-0.1603 

-0.04952 


(74) 


(75) 


“ 

2.426 

0.8399 

2.426 

1.787 

+ 

-0.1603 

(2.000- 

1  0  0 

1.787 

0.6065 

-0.04952 

0  6065 

2.068 

1.855 

0.6276 


(76) 


(77) 


5.244 

-1.001 

-0.3092 

-1.001 

3.523 

0.8933 

-0.3092 

0.8933 

0.4760 

0.8399 

-0.1603  1 

-0.04952 


5.244 

-1.001 

-0.3092 

-1.001 

3.523 

0.8933 

-0.3092 

0.8933 

0.4760 

0.8399 

-0.1603 

-0.04952 

-0.1603 

3.363 

0.8437 

-0.04952 

0.8437 

0.4607 

(78) 


(79) 
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2.6.2  Numerical  Example  of  Discrete-Time  Influence  Diagram  Filter.  The  influence  dia¬ 
gram  can  be  used  to  calculate  the  conditional  probability  densities  for  the  state  estimates  at  time 
<1,  as  was  done  by  the  Kalman  filter  equations  in  the  previous  subsection.  The  following  series 
of  figures  are  the  influence  diagram  operations  for  this  numerical  example.  Figure  20  depicts  the 
entire  model  for  the  propagation  of  the  states  from  time  to  to  time  ti  as  well  for  the  measurement 
model  at  time  ti.  The  labels  on  the  nodes  correspond  to  the  nomenclature  in  the  model  equations 
(26)  and  (40). 


Figure  21  is  the  same  diagram  with  the  nodes  numbered  one  through  eleven.  This  permits 
simpler  labels  on  subsequent  influence  diagram.  The  second  diagram  also  a.ssumes  that  the  initial 
state  estimates  (the  unconditional  mean  of  the  vector  x(to)),  have  been  propagated  to  all  successor 
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nodes  by  the  regression  coefficients.  This  is  equivalent  to  assuming  the  a  priori  information  is  a 
realization  of  the  random  vector  x(<o),  but  the  nodes  of  the  random  vector  remain  on  the  diagram 
and  are  not  instantiated. 


Figure  21.  Example  of  Discrete-Time  Filter  with  Nodes  Relabeled 


Each  node  has  a  mean  and  variance  associated  with  it,  given  in  the  form  (^,,u,).  The 
regression  coefficients  are  written  on  the  right  side  of  the  diagram,  rather  than  directly  on  the  lines, 
for  legibility.  All  values  on  the  influence  diagram  come  from  the  following  matrix  equations. 


1  1  0.4261 


0  1  0.7870 


0  0  0.6065 


(80) 
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Q<i(<o)  = 


r  3.063 

-2.336 

-0.5677 

Qdito)  = 

-2.336 

1.904 

0.4160 

-0.5677 

0.4160 

0.1080 

0  0 

H  1 

0  0 

1 

1 

-0.7628  1  0 

0  0  1 

3.063  0  0 

0  0.1218  0 

0  C  0.0004764 


-0.2922  1  0 
-0.1400  0  1 


1  -0.7628  0 


Gd(<o)  = 


P(<o)  = 


x{<o)  = 


1  0  0 
0  1  0 
0  0  1 

1  0  0 
0  1  0 
0  0  1 

1 
1 
1 


H(t,)  = 

v(<i)  =  1 


1  0  0 


(81) 


1  0  -0.2922 
0  1  -0.1400 
0  0  1 


(82) 


(83) 


(84) 


(85) 


(86) 

(87) 
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Each  diagram  starting  with  Figure  22  depicts  a  single  operation  of  arc  reversal,  node  removal, 
or  measurement  update.  These  subsequent  diagrams  also  show  the  variance  on  the  left  side  of  the 
diagram  for  legibility.  The  means  are  not  shown  until  they  change  during  the  update.  At  the 
top  of  each  of  these  diagrams,  there  is  a  short  description  of  the  changes  between  the  current  and 
the  previous  diagram.  The  equations  for  calculating  the  changes  in  the  variances  and  regression 
coefficients  are  repeated  here  in  Equations  (88)  through  (92)..  The  actual  calculations  are  not  shown, 
but  the  predecessor  node  (node  i),  the  successor  node  (node  j),  and  the  common  predecessors  of 
both  (the  set  of  nodes  K)  are  identified  on  each  diagram.  Finally,  the  changed  values  are  emphasized 
in  each  diagram. 


II 

+ 

(88) 

=  bkj  +  bkibxj 

(89) 

'  -  am 

(90) 

1 

1) 

(91) 

=  6h  +  6;,-6;.- 

(92) 

As  stated  earlier,  the  total  effect  of  the  regression  coefficients  from  the  measurement  node 
to  the  state  estimates  is  identical  to  the  Kalman  gain  matrix.  Furthermore,  the  variance  of  the 
measurement  node,  just  prior  to  the  measurement  in  Figure  38  is  identical  to  the  value  calculated 
for  HP(tJ’)H^  +  R  in  the  Kalman  filter.  The  means  of  the  nodes  are  identical  to  the  vector 
x(t^).  The  covariance  matrix  represented  by  the  variances  and  regression  coefficients  in  Figure  36 
is  the  same  as  the  covariance  matrix  calculated  with  the  Kalman  filter  in  the  previous  subsection 
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as  P{ti)-  This  can  be  verified  by  multiplying  the  matrices  as: 


10  0  1  0  0 
P(<r)  =  -0.1908  1  0  -0.01118  1  0 

0  0  1  0.2504  0  1 


5.244  0  0  1 

0  3.332  0  0 

0  0  0.24886  0 


1  -0.1908  0  1  0  -0.01118 

0  1  0  0  1  0.2504  (93) 

0  0  1  0  0  1 


5.244 

-1.001 

-0.3092 

-1.001 

3.523 

0.8933 

-0.3092 

0.8933 

0.4760 

Similarly,  the  covariance  matrix  in  Figure  39  can  be  verified  by  multiplying: 


P(t+)  = 


1  0  0 


1  0  0 


-0.1908  1  0  -0.01118  1  0 

0  0  1  0.2504  0  1 


0.8399  0  0 

0  3.332  0 


1  -0.1908  0  1  0  -0.01118 

0  1  0  0  1  0.2504  (95) 


0  0  0.24886  0  0  1  0  0  1 


0.8399  -0.1603  -0.04952 

P(^^)=  -0.1603  3.363  0.8437  (96) 

-0.04952  0.8437  0.4607 

2.6.3  Chapter  Summary.  This  chapter  was  a  tutorial  approach  to  influence  diagrams  for 
probabilistic  and  deterministic  variables.  It  was  intended  to  summarize  the  work  of  numerous 
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authors.  It  began  with  the  first  type  of  influence  diagram  created,  the  discrete  variable  type.  The 
basic  workings  of  the  diagram  were  shown  and  explained  using  an  example.  The  next  type  of 
influence  diagram  used  Gaussian  random  variables.  The  ma  hematics  of  this  type  of  diagram  were 
explained  using  another  example.  The  final  result  was  an  explanation  and  demonstration  of  the 
Gaussian  influence  diagram  for  discrete-time  filtering.  There  was  also  a  demonstration  of  some  of 
the  similarities  between  the  the  Kalman  filter  and  the  influence  diagram. 


The  purpose  of  the  explanations  in  this  chapter  was  to  prepare  the  reader  for  more  detailed 
discussion  of  the  influence  diagram  in  subsequent  chapters.  The  implementation  of  the  discrete-time 
filter  was  due  to  Kenley  [6] ,  but  the  next  chapter  will  demonstrate  some  changes  in  implementation 
that  will  make  the  discrete-time  filter  algorithm  more  efficient. 


Vi  =  1 
V2  =  1 
V3  =  1 
V4  =  0 
Vs  =  0 

ve  =  0.0004764 
V7  =  3.063 

V8  =  0.1218 
Vio  =  1 
vii  -  0 

tl4  =  1 

624  =  1 

625  =  1 

634  =  0.4261 

635  =  0.7870 
^36  =  0.6065 

674  =  1 

bye  =  -0.2921 
678  =  -0.7628 

^85  =  1 

bse  =  -0.1400 

^4,11  =  1 
^*10,11  “  1 


Figure  22.  Example  of  Discrete-Time  Filter  with  Node  9  Removed 
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Reverse  node  8  to  node  5.  i={8},  j={5},  K={2,3,7} 


Vj  =  1 

V2  =  1 
V3  =  1 
1)4  =  0 

V5  =  0.1218 

V6  =  0.0004764 
vj  —  3.063 

V8  =  0 
Dio  =  1 
Da  =  0 

6i4  =  1 
^24  =  1 

625  =  1 

b28  =  -1 

634  =  0.4261 

635  =  0.7870 
63$  =  0.6065 
b38  =  -0.7870 
b58  =  1 

674  =  1 

b75  =  -0.7628 
676  =  -0.2921 
b78  =  0 

686  =  -0.1400 

64.11  =  1 

610.11  =  1 


Figure  23.  Example  of  Discrete-Time  Filter  with  Arc  Between  Nodes  8  and  5  Reversed 
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Remove  node  8.  i={8},  j={6},  K={2,3,5,7} 


Vi  =  1 
V2  =  1 
V3  =  1 
=  0 

Vs  =  0.1218 

V6  =  0.0004764 
v^  —  3.063 

vio  =  1 

vn  =  0 


6i4  =  1 

624  =  1 

625  =  1 

b26  =  0.1400 

634  =  0.4261 

635  =  0.7870 
b36  =  0.7167 
b56  =  -0.1400 

674  =  1 

675  =  -0.7628 

676  =  -0.2921 

^4,11  =  1 
^10,11  =  1 


Figure  24.  Example  of  Discrete-Time  Filter  with  Node  8  Removed 
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Figure  25.  Example  of  Discrete-Time  Filter  with  Arc  Between  Nodes  7  and  4  Reversed 
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Reverse  node  7  to  node  5.  i={7},  j={5},  K={1,2,3,4} 


vi  =  1 
V2  =  1 
V3  =  1 

V4  =  3.063 

V5  =  0.1218 

V6  =  0.0004764 

V7  =  0 

Vio  =  1 
vn  =  0 

6i4  =  1 

bis  =  0.7628 
6i7  =  -1 
624  =  1 

b25  =  1.7628 
626  =  0.1400 

^27  =  -1 

634  =  0.4261 
b35  =  1.112 

636  =  0.7167 

637  =  -0.4261 
b45  =  -0.7628 
647  =  1 

b57  =  0 

656  =  -0.1400 
bre  =  -0.2921 

^4,11  =  1 

^10,11  =  1 _ 


Figure  26.  Example  of  Discrete-Time  Filter  with  Arc  Between  Nodes  7  and  5  Reversed 
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Remove  node  7.  i={7},  j={6},  K={1,2,3,4,5} 


=  1 
V2  =  1 
V3  =  1 

V4  =  3.063 

V5  =  0.1218 

U6  =  0.0004764 
Vio  =  1 
vn  =  0 

614  =  1 

615  =  0.7628 
bi6  =  0.2921 

624  =  1 

625  =  1.7628 
b26  =  0.4322 

634  =  0.4261 

635  =  1.112 
b36  =  0.8412 
645  =  -0.7628 
b40  =  -0.2921 
656  =  —0.1400 

64.11  =  1 

610.11  =  1 


Figure  27.  Example  of  Discrete-Time  Filter  with  Node  7  Removed 


63 


Reverse  node  3  to  node  4.  i={3},  j={4},  K={1,2} 


Vj  =  1 

V2  =  1 

V3  =  0.9440 
V4  =  3.244 

V5  =  0.1218 

V6  =  0.0004764 

Vio  =  1 

vii  =  0 

bio  =  -0.1313 

6h  =  1 

615  =  0.7628 

616  =  0.2921 
b23  =  -0.1313 

624  =  1 

625  =  1.7628 

626  =  0.4322 
635  =  1-112 
635  =  0.8412 
b43  =  0.1313 

645  =  -0.7628 

646  =  -0.2921 
656  =  -0.1400 

&4,11  =  1 

^10,11  =  1 


Figure  28.  Example  of  Discrete-Time  Filter  with  Arc  Between  Nodes  3  and  4  Reversed 
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Figure  29.  Example  of  Discrete-Time  Filter  .vith  Arc  Between  Nodes  3  and  5  Reversed 
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Figure  30.  Example  of  Discrete-Time  Filter  with  Node  3  Removed 


Reverse  node  2  to  node  4.  i={2},  j=:{4},  K={1} 


vi  =  1 

V2  =  0.7644 
V4  =  4.244 
Vs  =  1.289 
Vs  =  0.06360 

vio  =  1 

vii  =  0 

b-i  2  =  —0.2356 
614  =  1 
bis  =  0.6167 
tie  =  -0.2408 

625  =  1.6167 

626  =  -0.7858 
^42  —  0.2356 
645  =  -0.6167 
bis  =  0.2408 
bs6  =  0.5450 

64,11  =  1 

tio.ii  =  1 


Figure  31.  Example  of  Discrete-Time  Filter  with  Arc  Between  Nodes  2  and  4  Reversed 
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Figure  32.  Example  of  Discrete-Time  Filter  with  Arc  Between  Nodes  2  and  5  Reversed 
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Figure  34.  Example  of  Discrete-Time  Filter  with  Arc  Between  Nodes  1  and  4  Reversed 
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Remove  node  1.  j={6},  K={4,5} 


Vi  =  5.244 
V5  =  3.332 
V0  =  0.2489 

Vio  =  1 
Vii  =  0 

645  =  -0.1908 
b46  =  -0.01118 
bss  =  0.2504 

t4.11  =  1 
tio.ii  =  1 


Figure  36.  Example  of  Discrete-Time  Filter  with  Node  1  Removed 


Figure  37.  Example  of  Discrete-Time  Filter  with  Node  10  Removed 
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Figure  38.  Example  of  Discrete-Time  Filter  with  Arc  Between  Nodes  4  and  11  Reversed 


Instantiate  node  11.  Realization  is  2.000. 
Propagate  the  residual  (-0.4261). 

(2.068,0.8399) 

1  (1.855,3.332) 

645  =  -0.1908 

646  =  -0.01118 

0 

(0.6276,0.2489) 

656  =  0.2504 

Figure  39.  Example  of  Discrete-Time  Filter  with  Node  11  Instantiated  and  Means  Updated 
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III.  Implementing  the  Influence  Diagram 


The  previous  chapter  explained  Kenley’s  implementation  of  the  influence  diagram.  This 
chapter  picks  up  on  Kenley’s  original  work  and  demonstrates  three  improvements  to  the  original 
algorithm.  One  improvement  is  a  solution  to  the  infinite  successor  problem,  mentioned  in  the 
previous  chapter.  This  problem  occurs  because  the  influence  diagram  uses  the  terms  of  the  U 
matrix  to  calculate  the  changes  to  the  conditional  means  of  successor  nodes.  When  there  are  a 
large  number  of  successors,  U  is  very  large  and  it  becomes  impractical  to  propagate  the  changes  of 
the  means. 

A  second  improvement  is  a  more  efficient  method  of  realizing  a  vector  of  measurements.  In 
the  previous  chapter,  the  algorithm  for  updating  the  conditional  means  was  a  form  of  scalar  update. 
When  a  node  in  unconditional  form  was  realized,  the  residual  was  propagated  to  all  successor  nodes 
via  the  regression  coefficients.  After  the  realization,  the  original  node  was  no  longer  needed  and 
was  instantiated  (removed).  After  the  first  node  was  removed,  the  second  node  in  the  vector  was 
in  unconditional  form,  and  it  too  could  be  realized  and  removed.  The  process  continued,  and  the 
measurement  nodes  were  removed  one  at  a  time,  until  all  measurements  were  accomplished. 

The  third  improvement  is  a  demonstration  that,  under  certain  circumstances,  the  influence 
diagram  can  be  more  efficient  than  Kenley  showed  originally  [6:pp.  89-T06].  The  conditions  for 
this  reduced  operation  count  will  be  shown.  Also,  Kenley  stated,  but  did  not  show,  that  the 
influence  diagram  can  be  used  in  a  parallel  processing  architecture.  This  thesis  will  examine  the 
implementation  of  the  influence  diagram  in  a  parallel  processing  environment,  and  compare  the 
improvement  in  processing  time. 

Before  discussing  these  improvements,  it  will  be  useful  to  define  two  terms  which  will  be  used 
later  in  this  thesis,  the  “path  product’’  and  the  “path  coefficient.”  These  terms  both  represent 
scalar  functions  of  the  regression  coefficients. 
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A  path  product  will  be  defined  as  the  product  of  the  regression  coefficients  on  a  given  path 
from  a  predecessor  node  to  a  successor  node.  A  path  coefficient  will  be  defined  as  the  sum  of  the 
path  products  on  all  possible  paths  from  a  predecessor  node  to  a  successor  node.  An  example  will 
demonstrate  the  use  of  these  terms. 

Consider  an  influence  diagram  of  five  nodes,  numbered  sequentially  1  through  5.  The  path 
products  from  node  1  to  node  5  art: 


6i5 

(97) 

614645 

(98) 

613(635  +  634645) 

(99) 

612(625  +  624645  +  623(635  -f  634645)] 

(100) 

The  sum  of  these  path  products  results  in  the  path  coefficient,  which  will  be  denoted  by  the  term 
uis. 

In  more  general  terms,  let  Um,n  be  tlie  path  coefficient  from  a  node  m  to  a  successor  node  n, 
and  let  be  the  regression  coefficient  on  the  arrow  from  a  node  m  to  its  direct  successor  node 
n.  Another  expression  for  ui,r  is: 

Ul,r  =  ^>l,r  +  f'l,r-l(Wr-l,r)  +  ^l,r-2(«r-2,r)  •  •  .  ^I,2(u2,r)  (^01) 

As  was  shown  in  Chapter  2,  the  linear  change  in  the  conditional  mean  of  a  successor  node,  con¬ 
ditioned  on  the  realization  of  a  predecessor,  is  equal  to  the  path  coefficient  from  the  first  to  the 
second. 

Let  a  positive  definite,  symmetric  covariance  matr’x  be  factored  into  a  unit  lower  triangular 
matrix,  a  oiagonal  matrix,  and  a  unit  upper  triangular  matrix  equal  to  the  transpose  of  the  unit 
lower  triangular  matrix  (a  unit  triangular  matrix  is  defined  to  have  ones  on  the  main  diagonal). 
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Such  a  factorization  is  expressed  as  U^DU  (or  equivalently  LDL^  where  U  —  L^)  and  is  unique 
[13:pp.  133-143].  Using  the  first  notation,  the  U  matrix  can  be  factored  as  a  product  of  unit 
upper  triangular  as  U  =  U1U2U3 . .  .U„,  where  n  is  the  dimension  of  U.  Each  matrix  is  an 
identity  matrix,  plus  nonzero  terms  only  in  the  jth  column  above  the  diagonal.  As  discussed  in 
Chapter  2,  Kenley  and  Schacter  showed  that  such  a  factorization  was  equivalent  to  an  influence 
diagram  representation  of  the  covariance  matrix  [12:pp.  547-548].  The  regression  coefficients  are 
the  nonzero  terms  above  the  diagonal  in  the  matrix.  Using  the  newly  defined  terminology,  the 
path  coefficients  are  the  w,j  terms  above  the  diagonal  in  the  U  matrix. 


As  an  example,  consider  a  4-dimensional  covariance  matrix  factored  as  described  above.  The 
matrix  D  is 

r 

Ul  0  0  0 

0  t;2  0  0 

0  0  Us  0 

0  0  0  U4 

where  ui,U2,  V3,U4  ^  0.  The  matrix  U  is  expressed  as 


D  = 


(102) 


U  = 


1 

0 

0 

0 

1 

612 

G 

0 

1 

0 

^13 

0 

1 

0 

0 

f>14 

0 

1 

0 

0 

0 

1 

0 

0 

0 

1 

f>23 

0 

0 

1 

0 

624 

0 

0 

1 

0 

0 

G 

1 

0 

0 

0 

1 

0 

0 

0 

1 

hi 

0 

0. 

0 

1 

0 

0 

0 

1 

0 

0 

0 

1 

0 

0 

0 

1 

(103) 


or  in  terms  of  the  path  coefficients: 


U~ 


1  U12  Ul3  Ui4 

0  1  U23  ^24 

0  0  1  U34 

0  0  0  1 


(104) 
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The  expression  for  the  path  coefficient,  in  terms  of  the  regression  coefficients  as  in  Equation  (101), 
can  be  verified  by  matrix  multiplication.  In  the  case  of  uu,  u^a,  and  «34  for  example,  the  multipli¬ 
cation  yields: 


W34  = 

hzA 

(105) 

U24  = 

624  +  h23l>3A 

(106) 

Ui4  = 

hiA  +  613634  +  612(624  +  623634) 

(107) 

S.l  The  Recursive  Algorithm 

The  influence  diagram  algorithm  has  a  problem  when  implementing  the  discrete-time  filter 
over  many,  possibly  infinitely  many,  time  intervals.  It  makes  the  assumption  that  when  a  measure¬ 
ment  is  made,  the  residual  will  be  propagated  to  all  successors.  In  a  recursive  discrete-time  filter 
with  possibly  an  infinite  number  of  successors,  propagating  the  mean  is  not  possible. 

One  implementation  scheme  avoids  the  infinite  successor  problem.  Assume  a  vector  of  nodes 
labeled  x(fj)  conditions  another  vector  z(t,)  and  the  influence  diagram  contains  no  other  nodes. 
Such  an  influence  diagram  was  shown  in  Figure  17.  Also  assume  that  the  vector  x(t,)  represents 
the  state  estimate  at  time  U,  before  incorporating  the  measurements  at  time  t,-.  The  vector  z(t,) 
represents  the  measurement  vector  at  the  same  time.  As  shown  in  the  second  diagram  in  Figure  17, 
the  nodes  of  z(t,)  are  moved  to  the  beginning  of  the  ordered  sequence,  and  the  random  variables 
associated  with  those  node  are  realized.  The  change  in  mean  is  propagated  to  all  nodes  of  x(<i),  and 
the  nodes  of  z(ii)  are  removed.  At  this  time,  x(t,)  is  conditioned  on  z(t,),  the  actual  measurements. 
If  x(f,)  were  followed  by  other  vectors  representing  x(fj+i),  x(f,+2),  and  so  on,  then  the  influence 
diagram  requires  that  the  residuals  be  propagated  to  these  vectors  as  well.  The  operation  is 
equivalent  to  calculating  the  conditional  means  of  x(t,+i),  x(f,+2)i  x(f,+3), . . . ,  x(f,+„),  conditioned 
on  the  measurements  at  time  U.  This  prediction  information  is  not  normally  desired. 
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Assume  now  that  the  residuals  are  not  propagated  to  subsequent  vectors.  Then  x(t,)  would 
have  the  correct  conditional  mean,  but  subsequent  vectors  would  not.  The  state  estimate  at  any 
time  ti  will  only  be  calculated  when  it  is  needed,  normally  at  time  U.  The  equation  for  doing  this 
is  the  Kalman  filter  update  equation: 

x(<r )  =  (108) 

The  measurement  estimates  z(t,'4.i),z(ti+2),z(t,+3), ...,z(ti+„)  are  also  successors  of  x(t<) 
because  they  are  successors  of  x(<,+i),  x(<j+2),  x(<,+3), . . . ,  x(<j+„).  The  means  could  be  propagated 
from  x(t,)  to  form  the  measurement  estimates  at  these  times,  conditioned  on  the  measurement  at 
timeij.  This  information  is  again  not  normally  needed  and  such  calculations  are  useless.  The  only 
estimate  usually  needed  is  the  estimate  (prediction)  of  the  measurements  at  time  ii ,  conditioned 
on  measurements  through  time  <,_i.  This  can  be  calculated  similar  to  the  Kalman  filter  equation 
as 

z{tr)  =  H{um7)  (109) 

To  summarize  this  implementation  method,  the  influence  diagram  should  be  used  to  calculate 
the  conditional  mean  of  the  states  at  time  U,  based  on  the  measurements  at  time  U.  However,  when 
there  are  many  successors,  representing  later  time  intervals,  the  influence  diagram  should  not  be 
used  to  calculate  the  conditional  means  for  all  of  them.  Instead,  the  conditional  means  for  later 
states  and  measurements  should  be  calculated  only  when  needed  using  Equations  (108)  and  (109) 
above. 

Kenley’s  original  work  did  not  discuss  deterministic  inputs  to  the  model.  An  equation  that 
can  represent  deterministic  inputs  to  the  system  is  given  by 


x(t,)  =  $(<i,t,_i)x(<,_i)  +  G4t,_i)wd(ti..i)  +  Ba(ti_i)urf(t,_i)  (110) 
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where  Ud(t,_i)  is  the  deterministic  input  and  is  the  matrix  that  describes  how  the  input 

affects  the  system  states  [7:220]. 

In  the  influence  diagram,  a  deterministic  input  U(j{<,_i)  can  be  modeled  as  a  deterministic 
vector  which  is  a  predecessor  of  the  x(<j)  vector.  The  arrows  connecting  Ud(t,_i)  to  x(t,)  are  the 
Bd(<,_i)  matrix.  The  mean  of  this  deterministic  vector  is  assumed  to  be  the  zero  vector  so  that  the 
actual  inputs  can  modeled  as  a  change  of  mean  to  the  deterministic  nodes,  and  then  propagated. 
Using  strict  influence  diagram  rules,  these  changes  of  mean  must  be  propagated  to  all  subsequent 
nodes.  Using  the  same  logic  as  earlier  though,  propagating  the  conditional  mean  to  later  times  is 
not  useful.  Instead,  the  matrix  equation  for  the  conditional  mean  at  time  including  deterministic 
inputs  at  time  t,_i  is 

x{ir)  =  #(<,-, /.,_i)x(t+_i)  +  Bd(£,_i)ud(t,_i)  (111) 


8.2  Vector  Measurement  Update 

When  there  is  more  than  one  random  variable  to  be  realized  (measurement),  then  there  are 
two  methods  for  incorporating  the  measurements.  In  one  case,  the  nodes  are  moved  upwards  and 
removed  immediately  as  they  become  unconditional.  In  the  other  case,  all  nodes  are  moved  upwards 
until  the  entire  vector  of  measurements  is  unconditional  (the  nodes  of  the  vector  are  conditioned 
only  on  other  nodes  in  the  vector),  then  all  the  measurements  are  incorporated  and  the  measurement 
nodes  removed. 

There  are  advamtages  to  each  update  method.  If  a  node  of  the  measurement  vector  is  re¬ 
moved,  then  the  arrows  to  subsequent  nodes  of  the  vector  are  not  needed.  As  other  nodes  of  the 
measurement  vector  are  made  unconditional,  there  are  fewer  predecessors  and  fewer  calculations 
required.  On  the  other  hand,  the  change  in  mean  at  any  node  is  the  product  of  the  change  in  mean 
of  the  predecessor  (measurement  node)  time  the  path  coefficient  between  the  two.  Thus,  propagat¬ 
ing  the  mean  requires  calculating  the  path  coefficients  from  each  node  in  the  measurement  vector 
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Figure  40.  Measurement  at  Two  Nodes  Propagated  to  Two  Successor  Nodes 


to  each  successor  node.  This  process  is  equivalent  to  recalculating  the  matrix  of  path  coefficients 
U  =  U1U2U3  . .  .U„  for  every  measurement  made.  For  an  n-dimensional  matrix,  recalculating  U 
requires  1  /6(n®  -  Zn}  +  2n)  additions  and  multiplies. 

There  is  a  much  more  efficient  method  of  updating  the  means  which  is  of  the  order  n^.  It  is 
equivalent  to  a  vector  update  and  it  requires  the  entire  measurement  vector  in  unconditional  form. 
It  does  not  explicitly  calculate  the  path  coefficients,  but  calculates  the  change  of  mean  directly.  It 
will  be  demonstrated  by  example. 

Assume  four  nodes  of  a  vector  z  in  an  influence  diagram  as  shown  in  Figure  40.  The  first 
two  nodes  will  be  updated  by  a  measurement.  The  second  two  are  conditioned  on  the  first  two. 
The  means  will  be  propagated  as  the  measurements  are  made.  Also  assume  /r,-  is  the  mean  of  the 
ith  node  before  the  update,  and  is  the  measured  value  for  i=l,2.  If  a  single  prime  indicates 
the  updated  mean  after  only  is  propagated,  then  let  ri  be  the  calculated  residual  and  use  the 
conventional  method: 


=  +f»12(Cl  -Ml) 

(112) 

=  M2  +  til2»’l 

(113) 

/J3 

=  M3  +  (613  +  f'l2f'23)(Cl  —  Ml) 

(114) 

=  M3  +  uiari 

(115) 
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/i4  =  ^*4  +  [^>12(^24  +  ^>23^34)  +  ^>13^34  +  ^’l4](Cl  “  mUi) 

=  /i4  +  UHn 


(116) 

(117) 


Let  a  double  prime  represent  the  mean  after  the  second  update,  that  is  the  measurement  (^2- 
Similarly,  r!^  is  the  residual  ((2  —  This  residual  uses  /iji  the  updated  mean  of  node  two.  But 
the  updated  mean  is  just  the  mean  of  node  2  conditioned  on  node  1  having  a  mean  of  Ci  ■  Call  the 
term  Tj  the  conditional  residual.  It  is  this  conditional  residual  which  is  propagated  as  follows: 


/^3 

=  Ms  +  ^23(^2  -  M2) 

(118) 

=  Ms  +  «23r2 

(119) 

=  M4  +  (^24  +  fc23^34)(C2  “  M2) 

(120) 

=  M4  +  «24»'2 

(121) 

After  simplifying  the  expressions  for  (i!(  and  in  terms  of  the  unprimed  variables,  the 
following  equations  result: 

1^3  =  ^*13(0  ~  /^i)  +  h3{C2  —  /^2)  + 1^3  (122) 

/^4  =  (il4  +  h3hA){^l  —  +  (^>24  +  fr23i’34)(C2  —  /^2)  +  (123) 

An  alternative  form  for  can  be  calculated  be  assuming  node  3  had  a  measurement  as 
well.  Let  this  new  measurement  be  (^3  and  calculate  using  the  mean  of  node  3  conditioned  on  both 
previous  measurements.  The  result  appears  as: 

/^4  =  tl4(Cl  - /il)  +  i24(C2  -  M2)  +  t34(C3  - /^s)  + /^4  (124) 

By  letting  the  assumed  (3  be  the  previously  calculated  m3",  the  previous  expression  for  results. 
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The  insight  is  that  the  new  mean  of  node  3  (propagated  from  previous  nodes)  can  be  treated  as  a 
measurement  at  node  3,  and  the  resulting  residual  propagated  to  the  next  node.  The  same  effect 
can  be  extended  to  node  4.  The  new  mean  of  node  4,  calculated  by  propagating  the  change  of 
mean  from  its  predecessors,  can  be  treated  as  a  measurement  at  node  4,  and  the  resulting  residual 
propagated  to  the  next  node.  The  argument  extends  by  induction  to  any  number  of  nodes.  For  a 
change  of  mean  that  is  not  the  true  realization  of  the  random  variable,  such  as  for  nodes  3  and  4, 
the  variance  is  not  set  to  zero. 

To  find  a  useful  algorithm,  use  the  term  C,-  to  refer  to  the  new  mean  of  node  i  after  all  updates, 
including  possible  updates  to  node  i  itself.  Use  the  term  r,-  to  be  the  residual  (C,-  —  fii)  where  m  is 
the  mean  before  any  updates.  The  first  p  nodes  are  the  actual  measurements,  and  they  represent 
p  realizations  of  random  variables.  For  each  of  these  nodes,  the  residual  is  calculated: 

for  i  =  Hop  ri  =  {Q  -  f^i) 

The  change  of  mean  (the  residual)  and  the  mean  itself  (the  assumed  measurement)  for  node  p  +  1 
is  calculated  by: 


»*p+i  =  6i,p+iri  +  62,p+i»*2  +  63,p+i»'3+,  •  •  ■,^p,p+i’'p  (126) 

Cp+i  =  '■p+i  +  ^^p+l  (127) 

The  residual  for  node  p+1  is  propagated  to  node  p+2  just  as  if  it  were  the  result  of  a  measurement. 
The  equation  for  this  propagation  and  the  resulting  mean  is: 

'’p+2  =  (il.p+zri  +  f»2,p+2'’2  +  t3,p+2'’3+>  •  •  ■.tp,p+2'’p) + 

(ip+i.p+2'’p+r)  (128) 

Cp+2  =  rp+2  +  Pp+2  (129) 
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The  process  continues  with  each  new  residual  treated  as  a  measurement  to  be  propagated  to 
the  next  node.  The  term  in  parenthesis  on  the  second  line  of  Equation  (128)  is  the  residual  from 
the  assumed  measurement  created  by  propagating  to  the  p+1  node.  The  calculations  for  the  last, 
or  n+p  node,  are; 


’’p+n 

=  (il,p+nfl  +  h,p+nr2  +  i>3,p+n»*3+)  •  •  • ,  &p,p+n»’p)  + 

(^P+liP+fi^P+l  "i"  ^p+2,p+»l'p+2,  •  •  • )  tp+n-l,p+n*’P  +  n  —  1) 

(130) 

Cp+n 

=  *p+n  +  Pp+n 

(131) 

These  calculations  can  be  simplified  to  a  series  of  matrix  operations.  If  there  are  p  nodes  to 
be  measured,  with  a  mean  propagated  to  n  successor  nodes,  then  the  residuals  for  the  successor 
nodes  p  +  1  to  p  +  n  is  given  by: 


rp+l 

tl.p+1 

frj.P+l  • 

••  ^P.P+1 

0  0 

0 

rp+2 

= 

i>l,p+2 

t2.p+2 

•  ^P,P+2 

&p+l,p+2  0 

1  •  • 

0 

Tp+n 

,p+n 

(>2,p+n  ■ 

^P.P+n 

1 

^P+l.P+n  (’p+2,p+n 

•  !>p+n-l, p+n 

rp+2 


rp+n-J  J 

(132) 


The  matrix  operation  shown  above  implies  that  the  update  uses  a  single  matrix.  The  true 
operation  is  the  sum  of  a  series  of  vector  inner  products.  The  matrix  is  the  transpose  of  the  p+1 
through  p+n  columns  of  the  B  matrix.  The  first  p  elements  of  the  rightmost  residual  column  vector 
are  partitioned  as  the  residuals  from  the  measurement  vector.  The  result  of  each  inner  product 
operation  is  augmented  to  the  residual  column  vector  as  a  new  residual  and  used  in  the  next  inner 
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Figure  41.  Example  of  Continuous  Gaussian  Influence  Diagram 


product  calculation.  The  total  number  of  operations  required  for  these  calculations  is  -  ^n+pn 
adds  and  multiplies.  The  effect  is  comparable  to  performing  a  vector  update  instead  of  sequential 
scalar  updates.  An  example  will  demonstrate  the  algorithm. 

In  the  example  given  in  Chapter  2,  the  conditional  mean  of  the  playing  time  in  a  basketball 
game  was  expressed  in  influence  diagram  form,  conditioned  on  the  player’s  height  and  the  number 
of  points  scored.  The  diagram  is  repeated  here  in  Figure  41.  The  conditional  mean  of  the  playing 
time  was  calculated  as  the  update  of  the  two  scalar  measurements  of  height  and  points  scored  per 
game.  First,  the  residual  of  the  measured  height  was  propagated  to  subsequent  nodes  and  the 
height  node  was  instantiated.  Then  the  residual  of  the  realized  number  of  points  per  game  was 
propagated  to  the  remaining  node.  The  conditional  mean  of  playing  time,  given  an  84  inch  player 
was  75.8889%.  If  an  84  inch  player  scored  only  16  points  per  game,  then  the  conditional  mean  of 
the  playing  time,  conditioned  on  both  measurements,  was  68.558%. 

Both  of  these  conditional  means  could  be  calculated  by  the  vector  update  algorithm.  For  the 
first  example,  the  average  playing  time  will  be  calculated  based  on  only  the  player’s  height.  The 
equations  are: 

buri  =  r2  T2  +  P2  =  Cz 

+  ^23r’2  =  **3  r'3  +  ^3  =  ^3 
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or  in  numerical  terms: 


0.2222(84  -  82)  =  0.4444  0.4444  +  20  =  20.4444  (135) 

0.077922(84 -82) +  1.6494(0.4444)  =  0.8889  0.8889  +  75  =  75.8889  (136) 

The  conditional  mean  for  both  the  number  of  points  scored  and  the  percentage  of  playing  time  are 
the  same  as  calculated  earlier.  These  equations  use  the  residual  rj.  This  is  not  a  true  residual 
because  there  is  no  measurement  of  the  points  per  game.  This  number  is  treated  as  a  residual  only 
to  make  the  algorithm  work. 

As  a  second  example,  assume  the  measured  height  and  the  number  of  points  scored  will  be 
used  as  a  measurement  vector,  where  the  dimension  of  the  vector  is  p=2.  The  update  equations 
are  now: 

biai'i  +  b23r2  =  ra  ra  +  ^3  =  Cs  (13”^) 

or  in  numerical  terms: 

0.077922(84-  82)+  1.6494(16 -  20)  =  -6.4417  -6.4417  +  75  =  68.558  (138) 

Again,  the  vector  update  equations  yield  the  same  results  as  the  earlier  example.  The  average 
points  per  game,  conditioned  on  the  player’s  height,  was  not  calculated.  There  was  no  need  to 
calculate  it  because  it  was  part  of  the  measurement  vector  and  it  would  take  on  the  measured  value 
no  matter  what  its  conditional  mean  was. 

3.3  Efficient  Implemeniaiion  Form 

Kenley  demonstrated  that  the  influence  diagram  was  competitive  with  the  U-D  Alter  in  terms 
of  the  number  of  required  mathematical  operations  (6:pp.  52-106].  He  used  a  form  of  the  influence 
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diagram  as  was  shown  in  Figure  18.  In  that  diagram,  the  nodes  of  Wd(ii-i)  have  no  arrows 
between  them,  implying  they  are  independent.  This  is  equivalent  to  assuming  the  matrix  product 
GdiU-i)Qd{U-i)G'j{ti^i)  has  been  factored  such  that  Qd{U~i)  is  a  diagonal  matrix. 

The  example  in  Chapter  2  of  this  thesis  used  an  influence  diagram  implementation  of  the 
discrete-time  filter  with  different  assumptions  about  the  factorization  of  Gd(tj_i)Qd(t,_t)Gj(t,_i). 
In  Figure  21,  Wd(t,_i)  was  expressed  as  an  influence  diagram  factorization  of  the  matrix  product 
Gd(t,_i)Q(j(ti_i)Gj(t,_i)  and  G,i(<,_i)  was  assumed  to  be  the  identity  matrix.  The  general  case 
of  discrete-time  filter,  assuming  =  I  is  shown  in  Figure  42.  The  influence  diagrams  in 

Figure  18  and  Figure  42  are  different  representations  of  the  same  discrete-time  model.  When 
is  removed,  both  diagrams  reduce  to  the  same  influence  diagram  shown  in  Figure  43. 

Because  of  the  identity  matrix  transformation,  and  the  fact  that  x(t,)  is  a  deterministic 
function  of  Wd(<<_i),  the  removal  of  Wd(<,_i)  results  in  a  simple  form.  The  conditional  variances  of 
the  nodes  of  x(<,)  in  Figure  43  are  the  same  as  the  conditional  variances  of  the  nodes  of  yfd{U-i)  in 
Figure  42.  Furthermore,  the  regression  coefficients  between  the  nodes  of  x(f()  in  Figure  43  are  the 
same  as  the  regression  coefficients  between  the  nodes  of  Wd(t,_i)  in  Figure  42.  A  demonstration  of 
this  can  be  seen  by  comparing  the  appropriate  numbers  for  the  example  in  Chapter  2,  Figure  21 
and  Figure  27.  In  those  figures,  the  variances  and  regression  coefficients  of  Wrf(t,_i)  of  Figure  21 
appear  to  transfer  directly  to  the  nodes  of  x(t,)  in  Figure  27. 

A  further  comparison  of  these  example  diagrams  reveals  that  the  regression  coefficients  cor¬ 
responding  to  the  matrix  in  Figure  21  are  not  the  same  as  the  comparable  regression 

coefficients  in  Figure  27.  The  new  regression  coefficients  are  no  longer  the  elements  of  the  state 
transition  matrix  $(<,-, t,_i),  but  have  been  modified  because  of  the  influence  diagram  operations 
between  Figure  21  and  Figure  27.  These  new  regression  coefficients  can  be  thought  of  as  being  the 
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Figure  42.  Alternative  Representation  of  One  Propagation/Update  Cycle  of  the  Influence  Dia¬ 
gram  Discrete  Time  Filter 
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Figur'"*  43.  Influence  Diagram  Discrete-Time  Filter  with  Wd(t,_i)  Removed 
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elements  of  a  new  matrix.  Call  this  equivalent  matrix  The  elements  of  can  be  calculated  as 


(139) 


where  Bj  is  the  influence  diagram  B  matrix  corresponding  to  the  influence  diagram  factorization  of 

and  I  is  the  identity  matrix  of  appropriate  dimension.  In  the  previous 
example,  the  can  be  calculated  as: 


0 

-0.7628 

-0.2921 

1 

1 

0.4261 

0 

0 

-0.1400 

0 

1 

0.7870 

0 

0 

0 

0 

0 

0.6065 

1 

1 

0.4261 

0.7628 

1.7628 

1.112 

0.2921 

0.4322 

0.8412 

These  are  seen  to  be  the  same  as  the  regression  coefficients  in  Figure  27. 


(140) 


(141) 


One  way  to  understand  these  changes  in  the  regression  coefficients  corresponding  to  the 
^(t»)ti-i)  matrix  is  as  follows.  The  total  effect  of  all  regression  coefficients  from  a  node  of  x(ti_i) 
to  a  node  of  x(t,)  is  equivalent  to  the  appropriate  element  of  the  #(t,-,<,_i)  matrix.  This  total 
eflfect  is  the  change  in  the  conditional  mean  of  x(t,-),  given  a  realization  of  the  predecessor  nodes  in 
x(<<_i).  This  total  effect  does  not  change  when  the  nodes  of  x(t,)  are  conditioned  on  one  another. 
As  the  regression  coefficients  are  added  between  the  nodes  of  x(t,),  the  coefficients  from  x(t,_i) 
must  be  changed  to  compensate. 


The  previous  analysis  demonstrates  a  significant  reduction  in  computations  when  the  matrix 
product  G<f(ti_i)Q(j(t,_i)Gj(tj_i)  is  given  in  influence  diagram  form.  Under  these  circumstances, 
all  influence  diagram  operations  needed  to  remove  Wd(<,_i)  are  unnecessary.  Instead,  an  influence 
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diagram  as  was  shown  in  Figure  43  can  be  drawn  immediately,  with  the  regression  coefficients 
from  x(t,_i)  modified  as  in  Equation  (139).  In  the  example  of  Chapter  2,  this  would  have  been 
equivalent  to  skipping  from  Figure  21  to  Figure  27  without  needing  to  go  through  the  intermediate 
equations.  The  author  is  not  aware  of  any  such  operations  savings  with  other  forms  of  the  Kalman 
filter  when  is  available  in  a  factored  form. 

S.4  Operations  Count 

An  important  characteristic  of  the  influence  diagram  algorithm  is  the  number  of  floating  point 
operations  required  for  implementation  on  a  digital  computer.  Kenley  calculated  the  number  of 
operations  necessary  for  updating  the  variances,  and  compared  them  with  the  operations  counts  for 
other  Alter  implementations  [6:pp.  89-106].  In  that  comparison,  he  assumed  a  filter  of  the  form  in 
Figure  18.  He  demonstrated  that  the  U-D  filter  and  the  influence  diagram  require  a  similar  number 
of  operations. 

Kenley’s  original  operations  count  only  addressed  the  computations  required  for  the  update 
of  the  variances,  not  the  state  estimates.  If  the  influence  diagram  state  estimates  are  updated  using 
the  vector  update  method  from  the  previous  section,  then  the  influence  diagram  remains  equivalent 
to  the  U-D  filter  in  terms  of  computational  efficiency. 

The  influence  diagram  has  a  significant  advantage  when  Gd(t,_i)Qd(<i-i)Gj(ti_i)  is  avail¬ 
able  in  influence  diagram  form.  However,  as  was  shown  in  Chapter  2,  the  influence  diagram 
form  of  the  matrix  product  Gd(ti_i)Qd(ti_i)Gj(tj_i)  is  the  same  as  a  factorization  of  the  form 
. .  .U3  Uf UfDUiUaUs . .  .Un  where  n  is  the  dimension  of  the  matrix  product.  It  will  be 
shown  later,  in  Chapter  4,  that  such  a  factorization  is  very  similar  to  the  UDU^  factorization  used 
in  the  U-D  form  of  the  Kalman  filter.  Because  of  this  similarity,  it  will  be  assumed  that  a  covariance 
matrix  can  be  expressed  in  influence  diagram  form  just  as  efficiently  as  it  can  be  expressed  in  U-D 
factored  form. 
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Table  4  compares  the  conventional  Kalman  filter,  the  U-D  filter,  and  the  influence  diagram. 
The  Kalman  filter  implementation  assumes  is  available  as  a  single  ma¬ 

trix.  The  U-D  filter  form  assumes  that  Gd(t,_i)Qd(t,_i)Gj(t,_i)  is  in  U-D  factored  form  such 
that  Gd  is  equal  to  the  upper  triangular  U  factor  and  Qj  is  equal  to  the  diagonal  D  factor.  The 
influence  diagram  implementation  assumes  G(i(t,-_i)Qd(t,_i)Gj(t,_i)  is  expressed  in  influence  di¬ 
agram  form  as  in  Figure  42.  The  only  operations  needed  to  transform  the  influence  diagaram  from 
a  form  as  in  Figure  42  to  a  form  as  in  Figure  43  are  given  in  Equation  (139). 

The  number  of  operations  required  for  the  influence  diagram  implementation  is  calculated  in 
Appendix  A.  The  number  of  operations  for  the  Kalman  filter  and  the  U-D  filter  are  reproduced  from 
Maybeck  [7:403]  using  the  same  assumptions  about  Gd(<,_i)Qd(t,_i)Gj(tj_i)  as  in  the  previous 
paragraph.  For  a  specific  example,  Table  5  represents  the  execution  time  for  a  typical  discrete- 
time  filtering  problem  with  a  10-dimensional  state  vector  and  a  2-dimensional  measurement  vector. 
The  execution  times  come  from  Maybeck  [7:404].  These  tables  show  that  the  influence  diagram 
significantly  exceeds  the  U-D  filter  in  speed. 

Under  certain  conditions,  the  matrices  G<j(ti_i)Qd(t,_i)Gj(t,_i)  and  may  be 

constant  from  one  time  interval  to  the  next.  Under  these  conditions,  all  terms  on  the  right  side 
of  Equation  (139)  are  unchanged  from  one  time  interval  to  the  next  and  there  is  no  need  to 
recompute  The  operations  count  for  the  influence  diagram  will  require  n(n  —  1)  fewer  additions 
and  multiplications  if  Equation  (139)  is  unnecessary.  Using  the  assumptions  of  Table  5,  this  equates 
to  (2.7  -f  4.1)45  or  306  fewer  microseconds  per  cycle. 

3.4-1  Pipeline  Processing.  Kenley  mentions  that  the  influence  diagram  lends  itself  to  parallel 
processing  because  reversing  the  conditioning  on  two  nodes  results  in  changes  that  are  isolated 
within  the  diagram.  A  specific  example  shown  in  Figure  44  demonstrates  the  potential  for  parallel 
processing. 
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Table  4.  Operations  for  One  Time  Propagation  and  One  Measurement  Update 


Filter 

Adds 

Multiplies 

Divides 

Conventional 

Kalman 

5(3n^  +  3n^p-t- 
5np  -  n) 

^(3n^  -f  3n^(p  -1- 1)-1- 
3np) 

m 

U-D 

i(5n^  +  n-'(3p-l-  2)-t- 
n(3p-t- 1)) 

n^(3p-f  11)-!- 
n(p-6)) 

n(p+  1)  -  1 

Influence 

Diagram 

2n^-l-n^(p-  0.5)-f 
+  p-  0.5) 

2n“  +  n^(p  +  3.5)+ 
n(p^  +  5p-  1.5) 

n(n  +  p-  1) 

Assumptions:  is  available  as  a  single  matrix, 

for  the  Kalman  filter,  in  U-D  factored  form  for  the  U-D  filter, 
or  in  influence  diagram  for»n  for  the  influence  diagram. 

State  and  dynamics  driving  noise  dimension  =  n 
Measurement  dimension  =  p 


Table  5.  Operation  Time  for  One  Tilter  Recursion 


Filter 

Adds 

Multiplies 

Divid'".; 

Time  (msec) 

Conventional 

Kalman 

1845 

2040 

2 

13.36 

U-D 

2935 

3330 

29 

21.77 

Influence 

Diagram 

2205 

2675 

no 

17.65 

Assumptions;  Gd(t,_i)Qd(t,_i)Gj  known  as  in  previous  table. 
State  and  dynamics  driving  noise  dimension  =  10 
Measurement  dimension  =  2 

The  execution  time  per  cycle  assumes  each  operation  requires; 

2.7  /isec  per  addition 
4.1  nsec  per  multiplication 
6.6  /isec  per  division 
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Figure  44.  Pipeline  Implementation  of  the  Removal  of  Three  Nodes 
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Assume  the  first  three  nodes  are  to  be  moved  to  the  right  and  eliminated  from  the  diagram. 
The  third  node  is  removed  first  because  it  is  the  most  conditional.  The  .arc  between  it  and  the 
succeeding  node  is  reversed  resulting  in  the  second  influence  diagram  in  Figure  44.  The  next 
operation  could  be  either  moving  node  3  further  to  the  right,  or  move  node  2  to  the  right.  The 
predecessor  values  modified  in  one  c  peration  are  distinct  from  those  modified  in  the  other  operation. 
As  a  result,  both  operations  could  be  done  simultaneously  in  separate  processors.  This  simultaneous 
reversal  is  shown  in  the  third  diagram  in  Figure  44.  'l"he  next  node  can  be  moved  to  the  right  at  the 
Scime  lime  as  the  first  two  move  further  rightwards  as  in  the  fourth  diagram.  The  process  continues 
until  all  three  nodes  are  removed. 

Assume  that  there  is  a  single  processor  dedicated  to  the  task  of  calculating  all  necessary 
equations  for  each  of  the  n  nodes  being  moved  rightwards.  This  is  only  one  possible  way  of  applying 
parallel  processing  to  the  influence  diagram  algorithm,  but  it  does  serve  as  a  basis  for  comparison. 
In  this  example,  there  would  be  three  processors,  one  assigned  to  each  of  the  first  three  nodes. 
Between  the  first  and  second  influence  diagrams,  a  single  processor  would  reverse  nodes  3  and  4. 
Between  the  second  and  third  diagrams,  one  processor  would  be  used  to  exchange  nodes  3  and  5, 
while  another  would  be  used  to  exchange  2  and  4.  Moving  from  the  third  to  the  fourth  diagram 
requires  all  three  processors;  one  processor  removes  node  3,  another  reverses  nodes  2  and  5,  while 
the  third  reverses  nodes  1  and  4.  Moving  to  the  fifth  diagram  requires  only  two  processors,  one  to 
remove  node  2  and  another  to  reverse  nodes  1  and  5.  Finally,  only  a  single  processor  is  needed  to 
remove  node  1  and  re-ult  in  the  last  influence  diagram. 

Under  these  assumptions,  the  computation  time  for  each  cycle  is  reduced  greatly.  As  an 
example,  assumie  the  same  operations  counts  as  in  Table  4  and  Table  5.  In  this  case,  also  assume 
that  there  are  n  separate  processors,  each  dedicated  to  the  task  of  moving  one  node  rightwards  as 
in  Figure  44.  The  computation  time  for  such  a  configuration  is  given  in  Table  6.  As  in  Table  5, 
the  time  can  be  reduced  by  another  306  microseconds  if  there  is  no  need  to  recompute  The 
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Table  6.  Operations  Counts  and  Times  for  Influence  Diagram,  Pipeline  Architecture 


Adds 

Multiplies 

Divides 

Time  (msec)  1 

9n^  +  n(6p  -  19)-1- 
p2-4p+ 11 

9n^  -t-6n(p—  l)-f 
P^  +  1 

3n-hp-3 

837 

965 

29 

6.41 

Assumptions:  Ga(/j-i)Q<j(<t-i)Gd(<«-i)  is  known  beforehand, 
in  either  matrix  or  influence  diagram  factored  form. 

State  and  dynamics  driving  noise  dimension  =  10 
Measurement  dimension  =  2 
The  execution  time  per  cycle  assumes  each  operation 
requires  the  following  execution  times: 
addition=2.7  fistc 
mutiplication  =  4.1  nsec 
division=6.6  /isec 


operations  counts  in  Table  6  are  developed  in  Appendix  B.  The  reason  that  the  time  is  not  reduced 
by  a  factor  of  n  in  Table  6  is  because  there  is  idle  time  for  each  processor  as  it  waits  for  the  previous 
processors  to  accomplish  their  tasks  sufficiently  to  begin  operation  itself.  Idle  time  can  be  reduced 
by  doing  other  operations  (such  as  calculating  Equation  (139)). 


S.5  Chapter  Summary 

This  chapter  demonstrated  three  different  methods  of  improving  the  implementation  of  the 
influence  diagram  for  discrete-time  filtering.  These  improvements  make  the  influence  diagram 
more  efficient  than  the  U-D  factored  form  of  the  Kalman  filter  under  the  conditions  given.  For 
the  special  case  where  Equation  (139)  does  not  need  to  be  calculated  for  each  time  interval,  the 
influence  diagram  is  even  faster.  This  chapter  concluded  with  a  demonstration  of  the  influence 
diagram  implemented  in  parallel  processing  form.  In  this  case,  the  influence  diagram  was  shown  to 
be  very  fast  indeed. 
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IV.  Equivalent  Operations 


The  influence  diagram’s  relationships  to  matrix  operations,  as  previously  shown  by  Kenley, 
were  described  in  Chapter  2  of  this  thesis.  These  matrix  operations  will  be  a  key  to  numerical 
analysis  later  in  Chapter  5,  so  this  chapter  is  dedicated  to  a  detailed  analysis  of  the  influence 
diagram  operations.  The  intent  is  to  relate  the  influence  diagram  operations  to  well  understood 
arithmetic  and  matrix  operations.  The  analysis  will  occur  in  three  parts.  The  first  part  will  be 
a  demonstration  of  the  equivalence  of  matrix  products  and  node  reversal  operations.  The  second 
part  will  be  a  demonstration  of  a  matrix  equivalent  to  the  discrete-time  filter  shown  in  Chapter 
2.  The  third  part  will  be  a  direct  comparison  of  the  influence  diagram  and  the  U-D  algorithm  for 
discrete-time  filtering. 

4.1  Equivalence  to  Matrix  Operations 

The  most  important  operation  in  influence  diagram  manipulation  is  node  reversal.  It  is 
equivalent  to  changing  the  conditioning  order  of  a  pair  of  random  variables  while  maintaining  the 
joint  distribution  of  the  two.  For  the  case  of  Gaussian  random  variables,  only  the  conditional  and 
unconditional  variances  and  means  are  needed  to  describe  the  continuous  density  functions.  The 
Gaussian  influence  diagram  calculates  conditional  variances  directly,  and  calculates  the  conditional 
means  as  linear  functions  of  the  realizations  of  conditioning  variables. 

Gaussian  influence  diagram  operations  can  be  considered  in  terms  of  two  different  operations. 
One  operation  is  the  process  of  moving  a  node  upwards,  towards  the  beginning  of  an  ordered 
sequence.  It  is  equivalent  to  expressing  a  random  variable  as  being  conditioned  on  fewer  random 
variables  than  it  is  currently.  The  other  operation  is  moving  a  node  downwards,  toward  the  end 
of  the  ordered  sequence  It  is  equivalent  to  expressing  a  random  variable  as  being  conditioned  on 
more  random  variables  than  it  is  currently. 
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Before  discussing  these  two  types  of  operations,  there  is  one  notable  characteristit,  the 
influence  diagram  worth  observing.  If  node  i  is  a  conditional  predecessor  of  node  j,  and 
are  conditioned  on  a  set  of  nodes  K,  then  Kenley  demonstrates  the  conditional  covariance  between 
them,  conditioned  on  K,  is  equal  to  where  u,-  is  the  conditional  variance  of  node  i  [12:534].  For 
a  given  set  of  conditioning  variables,  in  this  case  K,  the  conditional  covariance  matrix  is  invariant. 
This  means  that  Vi  and  UijVi,  the  conditional  variance  an  1  covariance,  are  unchanged  as  well. 
Recall  that  u,-;  is  the  path  coefficient  as  defined  in  Chapter  3  and  represented  in  Equation  (101). 
The  author  presents  the  following  theorem  and  corollary,  without  further  proof,  that  will  be  useful 
in  later  calculations. 

Theorem  1  If  ihe  condtiional  predecessors  of  a  node  are  unchanged,  then  the  path  coefficient  from 
that  node  to  a  given  svccessor  node  is  constant  for  all  possible  orderings  of  successor  nodes. 

Corollary  1  For  a  given  set  of  conditioning  variables,  the  path  coefficient  from  a  given  node  to  a 
successor  node  is  equivalent  to  the  regression  coefficient  from  ihe  first  to  the  second  when  the  two 
nodes  are  placed  consecutively  in  an  ordered  sequence. 

4.1.1  Exchanging  a  Node  with  a  Predecessor.  Consider  now  a  single  node  in  an  influence 
diagram,  exchanged  with  its  predecessor  to  make  it  less  conditional.  This  continues  until  the  node 
is  at  the  beginning  of  the  ordered  sequence,  in  unconditional  form.  If  the  node  was  originally  the  rth 
node  in  the  sequence,  then  r  -  1  reversals  are  generally  needed  to  move  the  node  to  the  beginning. 
By  necessity,  the  variance  of  the  node,  expressed  in  unconditional  form,  must  equal  the  rth  term 
on  the  diagonal  of  the  covariance  matrix  that  corresponds  to  the  original  influence  diagram.  In 
matrix  form,  the  rth  diagonal  term  in  the  covariance  matrix  can  be  calculated  by  U^DU  as 

Err  =  Ur  +  (Ur_l,r)^Vr-l  +  (Ur-2.r) V-2+,  ■  •  • ,  (U2,r)^U2  +  (Ul,r)‘Ui  (142) 
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Since  all  terms  in  Equation  (142)  are  positive,  the  result  must  be  positive.  Furthermore,  the 
unconditional  variance  must  always  be  great.  -  than  the  value  of  the  conditional  variance  Vr- 


The  influence  diagram  calculations  to  express  .v  node  in  unconditional  form  must  be  equivalent 
to  the  expression  in  Equation  (142).  To  verify  that  the  influence  diagram  operations  yield  the  same 
value,  consider  the  general  case  of  four  nodes  in  ari  influence  diagram,  numbered  sequentially, 
corresponding  to  the  U  and  D  matrices  as  given  in  Equations  (102)  and  (104).  For  the  matrix 
operations,  E44  is  calculated  by: 

E44  =  V4  +  t)3U34  +  l;2«|4  +  UiUi4  (143) 

=  U4  +  ^3^34  +  ^2(624  +  i23^34)^ 

+t;i[6i4  +  613634  +  612(624  +  623634)]^  (144) 


Now  assume  that  node  4  is  being  moved  upwards  to  the  unconditional  position  in  the  influence 
diagram.  The  equations  for  this  influence  diagram  operation  were  given  in  Chapter  2.  These 
equations  result  in  new  values  for  the  variance  and  predecessor  coefficients  after  one  exchange  as: 


V4  +  V3634 

(145) 

624  +  623634 

(146) 

614  +  613634 

(147) 

It  is  worth  noticing  that  the  equations  for  the  regression  coefficient  634  is  equal  to  the  value  of 
U24  calculated  earlier.  This  is  a  demonstration  of  Corollary  1  that  the  path  coefficient  U24  is  equal 
to  the  regression  coefficient  634  when  node  4  immediately  follows  node  2  in  the  ordered  sequence. 
Also,  the  coefficient  is  equal  to  the  first  two  terms  of  the  previously  calculated  U14  term  and 
634  is  equal  to  the  terms  in  parenthesis  in  Equation  (107). 
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Another  exchange  upward  results  in 


<  =  U;  +  t,2(6'24)^ 

(148) 

6j4  =  6j4  +  612634 

(149) 

so  that  the  new  term  is  the  same  as  the  calculated  U14.  The  last  exchange  moves  the  4th  node 
to  the  top  and  the  resulting  variance  is 

<  =  (150) 

=  1^4  +  ^3634  +  V2{b2i  +  ^>23^34)^ 

+y-,[6j4  +  613634  +  612(624  +  623634)]^  (151) 

When  V4  is  eventually  moved  to  the  upper  left  of  the  matrix  and  put  into  unconditional  form, 
its  variance  will  equal  the  fourth  diagonal  term  in  the  covariance  matrix,  that  is  the  E44  term. 
The  expression  for  this  variance,  as  calculated  by  the  influence  diagram  algorithm,  is  identical  to 
the  expression  given  in  Equation  (144),  computed  by  straightforward  matrix  multiplication  of  the 
original  U^DU.  The  conclusion  to  be  drawn  is  that  the  influence  diagram  algorithm  for  moving  a 
node  upwards  is  identical  to  calculating  the  diagonal  terms  of  the  P  matrix  by  multiplying  U’’DU. 

The  factored  form  of  the  matrix  also  demonstrates  the  concept  of  “nuisance”  variables.  In 
the  above  example,  if  the  values  for  nodes  1,  2,  and  3,  are  not  needed  after  they  are  conditioned  on 
node  4,  then  there  is  no  need  to  save  them.  In  the  influence  diagram,  this  means  that  the  nuisance 
nodes  can  be  put  at  the  end  of  an  ordered  sequence,  and  removed. 

In  the  matrix  operation,  the  new  (primed)  values  would  replace  the  old  ones  after  every 
exchange.  The  rows  and  columns  for  the  “nuisance”  variables  would  not  be  needed.  As  an  example. 
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if  the  D  and  U  matrices  are  as  before: 


D  = 


ui  0  0  0 
0  U2  0  0 
0  0  V3  0 
0  0  0  V4 


(152) 


U  = 


1 

0 

0 

0 

1 

612 

0 

0 

1 

0 

bl3 

0 

• 

1 

0 

0 

6i4 

0 

1 

0 

0 

0 

1 

0 

0 

0 

1 

623 

0 

0 

1 

0 

624 

0 

0 

1 

0 

0 

0 

1 

0 

0 

0 

1 

0 

0 

0 

1 

bai 

0 

0 

0 

1 

0 

0 

0 

1 

0 

0 

0 

1 

0 

0 

0 

1 

(153) 


or  in  terms  of  the  path  coefficients: 


U  = 


1  Ui2  tll3  «14 

0  1  U23  U24 

0  0  1  U34 

0  0  0  1 


(154) 


then,  after  the  first  exchange,  only  the  first  three  rows  and  columns  need  be  stored.  In  this  case, 
U'  and  D'  become: 

r 

wi  0  0 

D'=l  0  uz  0  I  (155) 

0  0  vi 


U'  = 


1  0  0 
0  1  0 
0  0  1 


1  612  0 
0  1  0 
0  0  1 


1  0  b[, 

0  1  6'24 

0  0  1 


(156) 
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The  fourth  row  and  column  of  the  matrices  do  not  affect  the  unconditional  variance  of  the  third 


term  in  the  matrix.  These  extra  rows  and  columns  can  be  removed.  After  another  exchange,  the 
matrices  are  two  dimensional.  After  the  last  exchange,  D'"  =  and  U"'  =  1.  This  demonstrates 
that  the  matrix  equivalent  of  removing  a  “nuisance”  variable  is  the  removal  of  the  last  row  and 
column  of  the  U  and  D  matrices,  or  equivalently,  the  covariance  matrix  P.  The  “nuisance”  variable 
corresponds  to  the  last  row  and  column,  just  as  a  “nuisance”  variable  in  the  influence  diagram  must 
be  moved  to  the  end  of  the  ordered  sequence  before  it  is  removed. 

4-1.2  Exchanging  a  Node  with  a  Successor.  For  every  node  that  moves  up,  another  moves 
down.  The  operation  of  moving  a  node  downwards  makes  it  “more”  conditional  in  the  sense  that  it 
is  conditioned  on  more  variables.  As  it  becomes  more  conditional,  the  conditional  variance  becomes 
smaller.  In  the  last  example,  these  variables  were  discarded  as  “nuisance”  variables.  Sometimes 
though  the  conditional  variance  is  the  desired  value.  Such  was  the  case  for  the  discrete-time  filter 
in  which  the  conditional  variance  of  the  vector  x{ti),  conditioned  on  the  measurement  vector 
was  the  value  being  calculated. 

There  are  two  useful  ways  of  fiemonstrating  the  effect  of  making  a  variable  more  conditional. 
One  is  to  move  a  node  from  an  unconditional  position  to  a  conditional  position  where  it  has  several 
predecessors.  Another  way  is  to  move  several  nodes  down  one  step  as  would  happen  when  another 
node  is  moved  upwards  past  them  to  become  unconditional.  Both  processes  will  be  demonstrated. 

If  four  nodes  represent  an  influence  diagram,  and  tiie  nodes  are  numbered  sequentially  in  the 
ordered  sequence,  then  node  1  can  be  conditioned  on  all  other  nodes  by  exchanging  it  one  at  a 
time  with  its  successors.  At  each  exchange,  the  new  variance  is  recalculated  in  terms  of  the  original 
conditional  variances.  The  process  is  shown  in  Figure  45. 
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Figure  45.  Conditioning  a  Node  on  All  Other  Nodes 
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The  calculations  for  the  variance  of  as  it  moves  downward  are: 


v'2  =  V2  +  612U1 


(157) 


621 


t)l«2 

«2 

612V1 


(158) 

(159) 


Moving  node  1  down  further  requires  an  exchange  of  nodes  1  and  3.  The  equations  for  this  next 
exchange  are: 


V3  —  V3  + 

(160) 

^>23  =  ^23  +  621^13 

(161) 

„  V[V3 

t'l  =  -V 

(162) 

"3 

,  _  bizVi 

O3I  —  f 

(163) 

V3 

^>21  =  ^21  “  ^>23^31 

(164) 

Finally,  node  1  is  exchanged  with  node  4  so  that  all  nodes  are  conditioned  on  node  1.  The 
resulting  equations  are: 


»'4 

= 

V4  +  6^4^" 

(165) 

624 

= 

*24 +  *21*14 

(166) 

*34 

= 

*34  +  *31*14 

(167) 

vT 

— 

<V4 

mJ 

(168) 

V4 

*41 

*14< 

(169) 

V4 

*21 

= 

*21  ~  *24*41 

(170) 

*31 

HZ 

*31  —  *34*41 

(171) 
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If  the  resulting  equation  for  v'('  is  solved  in  terms  of  the  original  variables,  the  result  is: 


bl^ViVaVi  +  613U1 1)2^4  +  6i4WlU2t'3  +  V2V3V4 


(172) 


or,  if  Vi  ^  0  for  i=l,2,3,4,  then  v'('  can  be  expressed  as: 


^  ^  I  ^12  I  ^13  I  ^14 

v'C  Ui  «2  1^3  1^4 


(173) 


The  matrix  for  the  inverse  covariance  in  factored  form  is  (U^DU)“^  or  where 

— T  implies  the  inverse  of  the  transpose.  For  the  example  given,  the  matrix  operations  are: 


1 

-612 

-ha 

-6i4 

1/vi 

0 

0 

0 

1 

0 

0 

0 

0 

1 

-623 

—624 

0 

l/«2 

0 

0 

-i»12 

1 

0 

0 

0 

0 

1 

-634 

0 

0 

l/ua 

0 

~ha 

-623 

1 

0 

0 

0 

0 

1 

0 

0 

0 

-614 

—624 

-634 

1 

By  multiplying  these  matrices,  it  can  be  shown  that  the  first  diagonal  term  of  the  resulting  matrix  is 
equal  to  the  inverse  of  the  the  previously  calculated  value  for  v'{' .  This  process  can  be  generalized  to 
show  the  conditional  variance  of  a  previously  unconditional  variable.  If  node  1  is  to  be  conditioned 
on  all  other  nodes,  and  if  no  nodes  are  deterministic  (conditional  variance  equals  zero),  then  the 
final  conditional  variance  of  uj  has  the  form 


v\  = 


T.  +  + ‘L  +  iL  +  . . .  iL. 

W1  Va  ‘  Vj  ‘  Un 


Instead  of  v\ ,  assume  the  jth  node  is  conditioned  on  all  other  nodes.  Also  assume  node  j  is 
already  conditioned  on  nodes  1  through  y  —  1,  so  it  must  be  further  conditioned  on  nodes  j  +  1 
through  n.  If  none  of  the  nodes  j  through  n  are  deterministic,  then  the  conditional  variance  of  Vj 
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is  given  by  either 


1 


J.  1  ^  ^Liil  1  +  ..  .  iLL 

Vj  '  tij  +  l  tl,+J  Uj+J  V, 


(175) 


or 

i.=  1  I  j  ^lJ+2  j  ^l,i+3  ^  fcin 

wj  Uj  «j+i  t;j+2  Vj+3  Vn 

The  form  given  in  Equation  (176)  would  be  the  expression  for  the  conditional  variance  in  the 
discrete-time  filter  if,  for  example,  the  jth  node  of  the  j  dimensional  vector  x(t,)  were  to  be  condi¬ 
tioned  on  the  (n-j)  dimensional  vector  z(<i). 


The  form  given  in  Equation  (176)  is  important  because  it  is  the  jth  diagonal  term  of  the  inverse 
covariance  matrix.  This  demonstrates  that  the  conditional  variance  of  a  random  variable,  when 
conditioned  on  all  other  random  variables  in  the  vector,  is  equal  to  the  inverse  of  the  diagonal  term 
of  the  inverse  covariance  matrix.  It  also  demonstrates  the  relationship  between  the  conditioning 
of  a  random  variable  as  calculated  by  the  influence  diagram,  and  the  factored  form  of  the  inverse 
covariance  matrix. 


The  second  of  the  two  processes  is  a  different  look  at  the  same  kind  of  operation.  Assume 
now  that  node  4  is  to  be  put  in  unconditional  form  The  remaining  nodes  will  be  conditioned  on 
node  4.  The  variances  of  these  other  nodes  can  be  expressed  in  terms  of  the  original  variables. 


/  V4 

'3  =  ’'3  ,  ,2 

Vi  +  6I4V3 

(177) 

«4 

—  V3  1  2 

+  UI4V3 

(178) 

t>4  -1-  6§4t;3 

'  V4  -1-  634 V3  +  (^24  +  h3f>3i)^V2 

(179) 

W4  + 

■  V4  +  WI4V3  -f  ul^V2 

(180) 
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(181) 


V 


/ 

1 


_ «4  +  bljVs  -f  (^24  +  fe23^34)^«2 _ 

^  t»4  +  ^>34 ^3  +  (^24  +  t23i34)^U2  +  [^14  +  ^13^34  +  f>nihi  +  ^23^34)]^*^! 
V4  +  U34t;3  +  W24’'2 

I  2  I  I  2 

Vi  +  uiiV3  +  «24V2  +  «12W1 


(182) 


The  conditional  variances  represented  by  t>j,  V2,  and  v'^  could  occur  in  a  discrete-time  filter 
when  the  three  dimensional  vector  x(<,)  is  conditioned  on  the  scalar  measurement  z{ii)  represented 
by  node  4.  These  conditional  variances  are  calculated  by  division,  not  subtraction.  They  are 
expressed  in  terms  of  a  fraction  of  the  original  variances.  The  numeric  properties  of  this  form  will 
be  discussed  in  Chapter  5. 


4-1.8  Matrix  Representation  of  Node  Reversal.  The  operation  of  exchanging  nodes  in  a 
covariance  matrix  P  can  itself  be  expressed  in  matrix  form.  The  author  offers  this  demonstration 
of  a  matrix  equivalent  of  the  influence  diagram  operation  of  node  reversal.  Assume  the  inverse 
factored  matrix  is  P“^  =  U“^D"^U“^  and  let  U“^  =  V  =  (I  -  B).  The  factored  form  of 
the  covariance  matrix  is  V“’’DV"‘.  The  exchange  of  rows  and  columns  can  be  accomplished  by 
multiplying  the  covariance  matrix,  both  before  and  after,  by  a  transposition  matrix  of  the  form: 


I  I 


0 


0  1 
1  0 


(183) 


where  R  =  R~^. 

The  new  covariance  matrix  is  RPR.  In  this  case,  R  exchanges  the  ith  and  jth  row  and  column, 
where  j  is  the  last  row  of  the  matrix,  and  i  is  the  second  to  the  last  row.  In  general,  it  is  possible 
to  exchange  any  ith  and  jth  row  and  column,  as  long  as  i  and  j  differ  by  one.  The  new  matrix,  in 
factored  form,  is  RV“^DV“^Ror  (RV)~^D(RV)“^.  Unfortunately,  the  parenthetical  expression 
RV  is  no  longer  a  triangular  matrix.  It  can  be  made  triangular  again  by  post-multiplying  with  a 
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matrix  X  such  that  (RVX)“^(X^DX)(RVX)~^,  There  is  an  X  matrix  that  retriangularizes  the 
RVX  term,  and  maintains  a  diagonal  middle  term.  One  such  value  of  X  and  X“^  that  satisfies 
these  conditions  is: 


After  the  m.^trix  operations  are  carried  out,  the  result  is  (RVX)"^(X^DX)(RVX)"^  = 
where  D'  is  diagonal  and  V  is  upper  triangular.  This  constitutes  a  reordering  of  the 
variables  while  still  in  factored  form.  By  necessity,  it  must  be  equivalent  to  the  factorization  of  the 
permuted  matrix  RPR.  The  expressions  for  V'  and  D'  are: 
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0 


vi 


V2 


D'  = 


V3 


(187) 


0  Vj  +  b^jVi 

ViVj 

The  important  details  of  these  matrices  are  the  operations  in  the  ith  and  jth  columns.  These  are 
seen  to  be  identical  to  the  influence  diagram  operations  necessary  for  reversing  the  ith  and  jth 
node. 


^.J.^  Comparison  with  Kalman  Filtering.  The  influence  diagram  is  equivalent  to  a  factored 
form  of  the  covariance  matrix.  One  cycle  of  the  influence  diagram  for  the  discrete-time  filter  can 
be  represented  in  two  useful  forms,  as  was  shown  in  Figures  (18)  and  (42).  Each  of  these  diagrams 
represents  a  singular  covariance  matrix  P,  or  its  factored  form  U^DU.  When  the  matrices  are 
correctly  defined,  the  matrix  operations  in  the  previous  section  can  be  used  to  perform  the  influence 
diagram  operations.  This  section  will  present  the  matrices  necessary  for  the  influence  diagram 
representation  of  the  discrete-time  filter. 

Assume  that  the  discrete  time  filter  is  represented  by  the  infiuence  diagram  in  Figure  18. 
Let  the  vector  x(ti-.i)  be  represented  in  influence  diagram  form  as  a  diagonal  matrix  Dj,  and 
a  strictly  upper  triangular  matrix  B*  such  that  the  covariances  matrix  for  x(t,_i)  is  given  by 
(I  -  Bj:)“'^Dx(I  -  Bi;)~*.  The  vector  Wc((t,_i)  has  a  covariance  represented  by  a  diagonal  matrix 
Qd(t«-i)  and  a  linear  input  matrix  G<j(t,_i).  The  vector  v(tj)  has  a  covariance  of  matrix  R(t,) 
which  is  assumed  to  be  diagonal.  The  case  of  R(<,)  not  being  diagonal  will  be  discussed  later.  With 
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these  variables  defined,  the  diagonal  matrix  for  the  entire  discrete-time  filter  influence  diagram  is 


D,,  0  0  0  0 

I  0  QdiU-i)  0  0  0 

^diagram  —  0  0  0  0  0 

0  0  0  R(<i)  0 

0  0  0  0  0 

The  dimension  of  the  square  matrix  represented  by  the  0  term  in  the  middle  of  the  diagonal  is 
the  same  as  the  dimension  of  D^.  It  corresponds  to  the  deterministic  function  represented  by 
x(t,).  The  dimension  of  the  square  matrix  represented  by  the  last  0  is  the  number  of  terms  in  the 
measurement  vector.  It  corresponds  to  the  deterministic  function  represented  by  z(tj). 

The  B  matrix  for  the  influence  diagram  can  be  constructed  similarly.  Using  the  linear  relations 
from  the  influence  diagram,  the  matrix  is 

Bx  0  tj_i)  0  0 

0  0  Gj(t<_i)  0  0 

^diagram  —  0  0  0  0  (^89) 

0  0  0  0  1 

0  0  0  0  0 

The  dimension  of  each  row  and  column  is  the  same  as  the  corresponding  rows  and  columns  of 
Equation  (188). 

These  matrices,  combined  with  the  matrix  operations  from  the  previous  section,  can  be  used 
to  calculate  the  influence  diagram  operations  using  matrix  software  tools.  These  matrix  operations 
will  accomplish  the  following  steps.  First,  eliminate  the  second  group  of  rows  and  columns  from 
both  matrices  by  moving  them  to  the  end.  This  is  equivalent  to  eliminating  the  vector  Wrf(t,_i). 
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Then  remove  the  first  group  of  rows  and  columns  in  the  same  way,  equivalent  to  removing  the 
vector  x(t,_i).  Remove  the  rows  and  columns  corresponding  to  R(<,),  leaving  only  the  rows  and 
columns  for  x(tj)  and  z(l,).  Now  reverse  the  two  so  that  the  rows  and  columns  for  z{tj)  precede 
those  of  x(<,).  Finally,  use  the  resulting  B  matrix  to  update  the  states  v/ith  the  measurements, 
and  discard  the  first  columns  that  correspond  to  z(t,).  The  diagonal  and  strictly  upper  triangular 
matrix  that  remain  are  the  influence  diagram  equivalent  for  the  covariance  matrix  associated  with 
x(<i),  after  incorporating  the  measurements  at  time  t,-.  These  remaining  matrices  can  be  inserted 
back  into  the  matrices  of  Equations  (188)  and  (189)  as  and  B*,  and  the  process  starts  over  for 
the  next  time  interval. 

For  the  case  of  Gj(<,_i)Qi(t,_i)Gj(t,_i)  being  represented  in  influence  diagram  form  as  in 
Figure  42,  there  are  some  minor  changes  to  the  matrices  of  Equations  (188)  and  (189).  Instead  of 
Qd(<i-i)i  the  appropriate  diagonal  term  of  D<jiajram  is  the  diagonal  matrix  from  the  factored  form 
of  G<((<,_i)Q<j(<,_i)Gj(t,_i).  The  0  in  the  second  diagonal  place  of  Bjiafram  is  replaced  with  the 
B  matrix  from  the  factored  form  of  G(j(tj_i)Qd(tj_i)G2’(ti_i).  The  Gj(t,'_i)  term  of  Brfiojjrom 
becomes  the  identity  matrix.  The  description  of  the  matrix  operations  remains  unchanged. 

If  R(t<)  is  not  diagonal,  then  it  can  be  factored  in  influence  diagram  form.  The  diagonal 
terms  of  the  factorized  matrix,  called  D„,  would  replace  the  diagonal  terms  of  R(ti).  The  zero 
matrix  in  the  second  to  the  last  diagonal  position  of  Bdiagram  would  be  replaced  by  B„,  the  upper 
triangular  matrix  terms  from  the  factorization  of  R(t,). 

The  descriptions  in  the  preceding  paragraphs  for  removing  columns  and  rows  are  identical 
to  the  operations  described  in  Chapter  2  for  implementing  the  influence  diagram.  The  matrices 
such  as  in  Equations  (188)  and  (189)  demonstrate  the  correct  setup  for  each  set  of  operations. 
Equations  (185)  and  (187)  are  the  matrix  operations  needed  to  accomplish  the  correct  reversals 
and  reductions. 
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^.1.5  Comparison  with  U-D  Filtering.  The  U-D  factored  form  of  the  Kalman  filter  is  a 
st  Able,  efficient  method  of  calculating  the  means  and  covariances.  The  name  refers  to  the  factored 
form  of  the  covariance  matrix,  which  is  in  the  form  of  a  unit  upper  triangular,  diagonal,  and  a 
transposed  unit  upper  triangular  matrix,  or  UDU^.  If  the  covariance  matrix  is  nonsingular,  then 
the  factorization  is  unique.  If  D  is  itself  factored  into  two  identical  matrices,  then  these  matrices 
take  the  form  of  a  diagonal  square  root  matrix  Because  it  is  diagonal,  the  transpose  of 
is  equal  to  itself.  The  factorization  (US^/^)(S*/^U'^)  is  also  unique  and  is  equivalent  to  an  upper 
triangular  Cholesky  decomposition.  The  conventional  Cholesky  decomposition  takes  the  form  of 
P  =  LL^,  where  the  calculated  matrix  is  lower  triangular  [13:pp.  133-143]. 

The  influence  diagram  also  uses  a  factored  form  of  the  covariance  matrix,  but  uses  Uj’D/U/ 
where  the  subscripts  refer  to  a  different  U  and  D  matrix  for  the  influence  diagram.  It  is  apparent 
that  this  can  also  be  factored  into  lower  triangular  and  upper  triangular  matrices  by  using  as 
the  square  root  of  D;  and  is  its  own  transpose.  This  factorization,  (Sy*U/)^(S)'^^U/)  is  the 
Cholesky  decomposition  of  the  covariance  matrix  (12:pp.  547-548). 

Both  the  lower  triangular  Cholesky  and  the  upper  triangular  Cholesky  decompositions  are 
unique,  as  are  the  UDU^  and  Uj’D/U/  factorizations.  The  author  offers  the  following  theorem  as 
a  comparison  between  the  influence  diagram  and  the  U-D  factored  forms  of  the  covariance  matrix. 

Theorem  2  7/P  refers  to  a  positive  definite  symmetric  matrix  with  U-D  factors  o/UDU^,  and 
P*  refers  the  matrix  P  with  all  rows  and  columns  in  reverse  order,  then  P*  =  U*D*U*^  where 
D*  =  D/  and  U’  =  Uj. 
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Proof.  Let  R  be  a  square  matrix  of  same  order  as  U  and  D  of  the  form: 


0  0  •••  0  1 

0  0  •••  1  0 

R*  =  :  :  :  :  : 

0  1  •••  0  0 

1  0  •••  0  0 

Then  P*  =  RPR  and  R  is  its  own  inverse  because  RR  =  I. 


P*  =  R(UDU^)R 

(190) 

P*  =  R[U(RR)D(RR)U’’]R 

(191) 

P*  =  (RUR)(RDR)(RU^R) 

(192) 

The  factorization  is  unique,  therefore 

RUR  =  U*  =  Uj 

(193) 

RDR  =  D*  =  D/ 

(194) 

In  other  words,  P  is  the  covariance  matrix  for  a  state  vector,  and  P*  is  the  covariance  matrix 
for  the  state  vector  in  reverse  order.  The  U-D  factored  form  of  P  will  have  the  same  U  and  D 
matrices  as  the  influence  diagram  form  of  P*,  but  in  reverse  order. 

After  the  covariance  matrix  is  factored,  the  U-D  algorithm  does  not  normally  allow  the 
exchange  of  variable  order.  On  the  other  hand,  the  influence  diagram  algorithm  depends  on  the 
exchange  of  two  consecutive  variables  while  the  matrix  is  in  factored  form.  The  new  influence 
diagram  corresponds  to  a  factorization  of  a  new  covariance  matrix  that  has  a  row  and  column  pair 
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reversed.  For  a  given  covariance  matrix,  this  reversal  of  a  pair  of  rows  and  columns  is  equivalent 
to  exchanging  the  corresponding  random  variables  in  the  original  vector. 

Another  insight  comes  from  this  analysis.  For  a  given  positive  definite  symmetric  matrix, 
there  are  only  two  triangular-diagonal  factorizations.  One  factorization  is  UDiU^  and  the  other 
is  LD2L^  where  U  is  unit  upper  triangular,  L  is  unit  lower  triangular,  and  both  Di  and  Dz  are 
diagonal.  The  U-D  filter  uses  the  first  factorization  and  the  influence  diagram  uses  the  second. 
The  distinction  between  the  two  implementations  is  the  influence  diagram  factorization  starts  at 
the  top  left  of  the  covariance  matrix  and  proceeds  downwards,  while  the  U-D  factorization  starts 
at  the  lower  right  and  works  upwards. 

The  influence  diagram  and  the  U-D  filter  have  similar  forms,  and  they  have  the  same  values 
if  one  of  them  represents  the  covariance  of  a  reversed  order  vector.  But,  the  influence  diagram 
is  further  factored  into  U/  =  UjU2U3...U„  The  further  factorization  does  not  require  more 
storage  but  does  add  insight  into  the  meaning  of  the  variables.  This  form  stores  the  regression 
coefficients  6,j-  and  uses  them  to  calculate  the  path  coefficients  Uij  instead  of  storing  the  path 
coefficients  directly. 

4.2  Chapter  Summary 

This  chapter  demonstrated  several  similarities  between  influence  diagram  calculations  and 
matrix  operations.  The  calculations  associated  with  making  a  node  less  conditional  are  identical 
to  the  calculation  of  the  diagonal  terms  of  the  matrix  product  U^DU.  The  calculations  associated 
with  making  a  node  more  conditional  are  identical  to  the  calculation  of  the  inverse  of  the  diagonal 
terms  of  the  matrix  product  When  two  nodes  in  an  influence  diagram  are  exchanged, 

there  is  a  simple  way  of  expressing,  in  matrix  form,  the  combined  operations  of  making  one  node 
less  conditional  and  one  node  more  conditional.  It  was  shown  that  the  influence  diagram  operations 
are  equivalent  to  retriangularizing  a  matrix  using  another  matrix  X  as  given  in  Equation  (184). 
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This  chapter  also  demonstrated  a  method  for  representing  one  complete  cycle  of  a  Kalman 
filter  in  matrix  form  for  influence  diagram  operations.  This  would  be  useful  for  implementing  the 
influence  diagram  algorithm  on  a  computer  without  dedicated  influence  diagram  software.  Finally, 
this  chapter  demonstrated  the  similarities  between  the  U-D  filter  and  the  influence  diagram.  It  was 
proven  that  the  influence  diagram  and  U-D  factorizations  are  equivalent  in  the  same  sense  that  the 
lower  triangular  Cholesky  and  the  upper  triangular  Cholesky  decompositions  are  equivalent.  In 
this  sense,  they  can  be  considered  “mirror  image”  factorizations  of  the  covariance  matrix. 
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V.  Numerical  Properties 


5.1  Matrix  Operations 

Because  the  influence  diagram  is  based  on  matrix  operations,  the  stability  and  numerical 
properties  of  the  algorithm  can  be  shown  using  matrix  operations.  The  basis  for  such  analysis  are 
the  matrix  operations  U  =  (I  --  B)“^  and  the  factorization  of  the  covariance  matrix  into  the  U 
and  D  matrices  such  that  P(t,)  =  U^DU.  The  terms  of  the  U  matrix  can  be  computed  by  the 
matrix  inversion  (I  -  B)~^  and  were  given  in  Equation  (101)  as: 

«l,r  =  il,r  +  l>l,r-l(nr-l,r)  +  tj,r-2(Wr-2,r)  +  •  •  • ,  i>l,2(U2,r)  (^^5) 

The  terms  of  the  U  matrix  are  the  main  source  of  error  in  the  influence  diagram.  If  |u,;|  < 
16,j|,  then  the  u,j  term  is  calculated  by  subtracting  terms  of  nearly  equal  size.  This  means  there 
will  be  a  loss  of  significant  digits. 

One  set  of  circumstances  that  may  cause  a  cancellation  of  significant  digits  is  when  a  s  iccessor 
node  is  nearly  independent  of  a  jjredecessor.  If  there  are  interceding  nodes  between  the  two,  then  the 
regression  coefficient  from  the  predecessor  to  the  successor  may  be  much  larger  in  magnitude  than 
the  path  coefficient.  If  the  successor  is  made  less  conditional,  and  moved  ahead  of  the  interceding 
nodes  then,  by  Corollary  1,  the  regression  coefficient  will  become  the  small  arc  coefficient.  The 
calculation  will  the  require  subtraction  of  two  numbers  of  nearly  equal  size  and  may  have  large 
relative  error.  Numerically,  consider  a  regression  coefficient  613  to  be  approximately  equal  to  the 
negative  of  the  path  product  612623-  The  path  coefficient  is  calculated  as  U13  =  613  +  612623.  The 
path  coefficient  U13  will  be  much  smaller  in  magnitude  than  the  original  value  for  613,  and  may 
have  large  relative  error.  Some  general  rules  for  decreasing  the  likelihood  of  this  type  of  error  will 
be  discussed  later  in  this  chapter.  However,  the  author  knows  of  no  consistent  method  for  solving 
this  problem. 
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If  P  is  an  unconditional  coveu^iance  matrix,  then  the  diagonal  terms  of  the  matrix  product 
U^DU  are  the  unconditional  variances  of  the  random  variables.  They  can  be  computed  using 
either  the  matrix  product  or  the  influence  diagram  algorithm.  The  equation  for  calculating  the 
unconditional  variances  was  given  in  Equation  (142)  and  is  repeated  here: 

Err  =  Vr  +  (Ur_i,r)^«r-1  +  {Ur-2,rfVr-2+,  •  •  • ,  (U2.r)^t'2  +  («l,r)^Vl  (196) 


This  equation  shows  the  unconditional  variance  of  the  rth  term  in  the  matrix  can  be  calculated 
using  the  variances  and  path  coefficients  for  less  conditional  terms,  i.e.  terms  with  subscripts  less 
than  r. 

In  this  equation,  the  calculated  variance  is  the  sum  of  positive  products.  There  will  be  a 
high  numerical  accuracy  in  computing  this  sum  because  there  can  be  no  cancellation  of  significant 
digits,  as  occurs  with  the  small  difference  of  large  numbers.  Instead,  the  only  source  of  numerical 
difficulties  occurs  in  the  calculation  of  the  terms.  However,  cancellation  of  significant  digits 
in  one  of  the  Uij  terms  may  not  affect  the  overall  relative  error  in  calculating  the  unconditional 
variance  unless  the  other  terms  being  added  together  are  relatively  small  as  well. 

The  variance  of  a  variable,  conditioned  on  all  other  variables,  was  shown  to  be  the  inverse 
of  the  diagonal  terms  of  the  inverse  covariance  matrix  P“^.  If  any  of  the  conditioning  variables  is 
deterministic,  i.e.  has  a  variance  of  zero,  then  the  variance  of  the  conditioned  variable  is  zero  also. 
If  none  of  the  conditioning  variables  is  deterministic,  then  the  variance  of  a  variable,  conditioned 
on  all  other  variables  in  the  vector  was  given  in  Equation  (176)  as: 


J_  =  i.  [  +  t  I  ^  ^l,i+3  J  ^In 

u'  Vj  Vj  +  I  Vj+2  Vj+3  Vn 


The  implication  of  this  form  is  that  calculating  the  conditional  variance  is  done  with  high 
relative  accuracy.  There  are  no  errors  due  to  cancellation  of  significant  digits  as  existed  in  the 
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calculation  of  unconditional  variances.  One  possible  source  of  problem  is  when  one  of  the  successor 
variances  is  very  small.  The  inverse  will  be  very  large  and  will  dominate  the  other  addends  in  the 
sum.  If  the  corresponding  numerator  6,j  value  is  also  small,  and  was  calculated  with  high  relative 
error  because  of  a  cancellation  of  significant  digits,  then  this  relative  error  will  be  passed  on  to  the 
calculation  of  the  variance. 

5.1.1  SiabUtiy  of  Calculations.  Both  increasing  and  decreasing  the  conditioning  of  a  node 
are  numerically  stable  operations.  In  this  sense,  numeric  stability  is  defined  as  the  property  that 
the  computed  result  of  the  algorithm  is  the  same  as  the  the  exactly  computed  solution  of  a  problem 
that  has  its  values  slightly  perturbed  from  the  true  values  [15]. 

In  order  to  show  stability,  assume  the  the  symbol  fl()  represents  the  floating  point  equivalent 
of  the  operation  in  the  parenthesis.  Wilkinson  [15]  shows  that  /I(AB)=AB+E  where  A  and  B 
are  matrices  compatible  for  multiplication  and  E  is  an  error  matrix.  Similarly  ^(VD"’V^)  = 
VD“^V^  +  E.  If  we  define  the  matrix  E  to  be 

E  =  (VD-^E^’  +  VEj/V^  +  EvD-'V’’) 

+(VE;i’e3’  +  E„EJ/V^  +  E,D-'E^)  +  (E„E;/E^)  (198) 


then 


/1(VD-‘V^)  =  VD-‘V'^  +  E  (199) 

=  (V  +  E„)(D-'+EJ/)(V  +  e3’)  (200) 

where  E,,  and  E^i  are  error  matrices  associated  with  the  rounding  errors  in  V  and  D~\  respectively. 
The  “1"  in  the  subscript  of  E<ii  refers  to  the  first  of  two  error  matrices  associated  with  the  diagonal 
matrix  D.  The  error  matrix  E^i  is  a  diagonal  matrix  with  elements  of  the  same  magnitude  as 
potential  rounding  errors  in  D.  This  is  because  the  error  in  computing  the  inverse  of  the  diagonal 
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terms  is  on  the  same  order  as  the  original  rounding  errors  in  the  terms  themselves  [15].  The  elements 
of  the  error  matrix  E,,  are  of  the  same  relative  magnitude  as  the  potential  rounding  errors  in  the 
elements  of  V  [16:232]. 

If  V  is  equal  to  I  —  B,  then  the  matrix  product  VD"^V^  is  the  inverse  of  the  covariance 
matrix  as  shown  in  Chapter  3.  By  Equation  (200),  it  can  be  seen  that  all  error  matrices  have 
elements  on  the  same  order  of  magnitude  as  the  rounding  errors  in  the  initial  values  of  B  and  D. 
Recall  that  the  diagonal  terms  of  the  inverse  covariance  matrix  are  the  inverse  of  the  variances, 
conditioned  on  all  other  variables,  as  calculated  by  the  influence  diagram.  It  can  then  be  concluded 
that  the  errors  in  the  influence  diagram  algorithm  for  calculating  the  conditional  covariance  terms 
are  of  the  same  relative  magnitude  as  the  rounding  errors  in  the  original  variances  and  regression 
coefficients. 

Now  assume  that  P  =  U'^DU  is  the  covariance  matrix  of  an  unconditional  vector.  The 
algorithm  for  calculating  the  unconditional  variances,  represented  by  the  diagonal  terms  of  P,  is 
equivalent  to  the  influence  diagram  operation  of  making  a  node  unconditional.  Using  logic  similar 
to  before,  and  using  U^DU  =  V“^DV"^: 

yi(U^DU)  =  U^DU  +  E  (201) 


where 


E  =  (U^DE„  +  U^EdaU  +  EJ’DU)  +  (U^’EdaEu  +  E^EdaU  +  EJDE„)  +  (E^EdaE„)  (202) 


and  therefore 


/[(U^DU)  =  U^DU  +  E 

=  (U  +  E„)^(D  +  Ed2)(U  +  E„) 


(203) 

(204) 
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The  error  matrix  Ed2  is  associated  with  rounding  errors  in  D.  It  is  diagonal  and  has  elements 
of  the  same  order  of  magnitude  as  the  rounding  error  in  the  elements  of  D.  Similarly,  Eu  is  the 
error  associated  with  the  matrix  U.  However,  the  elements  of  Eu  are  not  necessarily  of  the  same 
magnitude  as  the  rounding  errors  in  the  original  values  of  D  and  B.  Instead,  V  =  (I  -  B)  used  the 
original  regression  coefficients  and  U  was  calculated  as  the  inverse  of  V.  It  can  be  shown  however 
that  if  =  U,  then 


/?(V-i)  =  (V  +  Eu)-' 


(205) 


=  (U  +  E„) 


(206) 


so  the  errors  can  still  be  related  to  rounding  errors  in  the  original  values  fl5].  There  is  an  upper 
bound  on  the  error,  Eu,  associated  with  computing  the  inverse  of  a  matrix  V.  However,  as  will 
be  shown  in  the  next  section,  this  upper  bound  is  the  product  of  Ey,  the  error  matrix  associated 
with  the  original  matrix  V,  and  a  positive  scalar  greater  than  unity  [15:105].  The  result  is  that  the 
errors  in  calculating  the  inverse  of  a  matrix  are  almost  invariably  larger  than  the  rounding  errors 
in  the  original  matrix. 

Using  these  demonstrations,  the  algorithms  for  calculating  the  conditional  and  the  uncondi¬ 
tional  variances  are  seen  to  be  stable.  Both  are  equivalent  to  a  simple  matrix  multiply  algorithm. 
Such  an  algorithm  is  numerically  stable,  and  will  be  relatively  accurate  as  long  as  no  severe  cancel¬ 
lation  of  significant  digits  occurs  [15].  Because  these  matrix  multiplications  are  in  quadratic  form, 
only  positive  values  are  added  and  no  cancellations  can  occur. 

The  relative  accuracy  of  the  algorithm  for  calculating  conditional  variances  will  be  better  than 
the  algorithm  for  computing  unconditional  variances.  This  is  because  the  algorithm  for  calculating 
the  conditional  variances  has  errors  of  the  same  magnitude  as  the  rounding  errors  in  the  original 
values  of  variances  and  regression  coefficients.  The  algorithm  for  calculating  unconditional  variances 
has  potentially  larger  errors,  equivalent  to  the  errors  associated  with  matrix  inversion. 
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5.2  Error  Analysis 


The  errors  in  calculating  the  variance  can  be  compared  with  other  operations  that  have  known 
error  bounds.  In  the  first  case  above,  the  variance  decreased  as  the  node  became  more  conditional. 
The  process  was  equivalent  to  forming  the  inverse  covariance  matrix. 

It  has  already  been  shown  that,  if  D  is  factored  into  the  diagonal  square  roots  D  =  S^S  and 
P  =  (U^S^)(SU),  then  the  U^DU  factorization  is  equivalent  to  the  lower  triangular  Cholesky 
decomposition  of  the  covariance  matrix  P.  Similarly,  the  factorization  VD”^V^  =  P“^  is  the 
upper  triangular  Cholesky  factorization  of  the  inverse  covariance  matrix.  The  factorization  of  the 
inverse  is  more  important  now  because  it  yields  the  values  of  the  regression  coefficients  directly. 

Wilkinson  shows  the  Cholesky  factorization  is  the  most  accurate  known  factorization  for  a 
symmetric,  positive  definite  matrix  [16:244].  If  the  matrices  L  and  L’’  represent  the  Cholesky 
decomposition  matrices  of  the  matrix  P“S  then  the  error  bounds  can  be  given  as 

LL^  =  p-‘  +  F 


where  F  is  an  error  matrix.  Under  the  assumption  that  inner  products  are  accumulated  in  double 
precision  and  then  rounded  to  single  precision,  Wilkinson  gives  error  bounds  for  the  elements  of 
the  F  matrix  as 

(r>s) 

l/-l<l  Kr,/rr|2-‘  (r  <  s)  \  (207) 

(r  =  s) 

where  t  is  the  number  of  significant  binary  digits  used  by  the  computer  [16:232]. 


When  the  influence  diagram  is  used  to  calculate  the  equivalent  of  VD“^V^  =  P”^,  there 
will  be  additional  (small)  errors  due  to  the  middle  term.  Also,  there  are  no  inner  products  to  store 
in  double  precision  as  can  be  done  with  the  matrix  product.  Instead,  the  variances  in  the  influence 
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diagram  eire  calculated  for  each  node  exchange  and  stored  in  single  precision.  Because  of  these 
differences,  the  possible  errors  are  larger  than  given  by  Wilkinson.  One  unrestrictive  error  bound 
replaces  t  by  </2.  It  is  the  equivalent  of  assuming  that  i  digits  represent  double  precision  and 
t/2  digits  represent  single  precision.  These  assumptions  are  very  pessimistic  and  will  overestimate 
the  error  bound.  Nevertheless,  by  simply  substituting  t  for  </2,  and  using  the  influence  diagram 
notation,  the  error  bounds  become 


VD-^V^  =  p-*  +  F 


|6o/vii|2-‘/2 

1/42-‘/2 


(i  >  i) 

(i  <  j)  ' 
(i  =  j) 

y 


(208) 


(209) 


The  error  bounds  for  the  factored  form  U^DU  are  not  as  small.  The  matrix  U  is  formed  by 
the  equivalent  of  taking  the  inverse  U  =  V~^  =  (I~  B)"^  The  errors  in  the  V  matrix  are  of  the 
same  magnitude  as  the  rounding  errors  due  to  floating  point  arithmetic.  Using  the  notation  and 
values  ao  shown  by  Wilkinson,  the  errors  in  the  matrix  U  =  V"^  can  be  bounded  as  follows.  Let 
II II  represent  any  consistent  matrix  norm  except  the  2-norm.  Also,  if  t  is  the  number  of  significant 
binary  digits  in  the  computer,  then  let  ti  =  t  —  0.08406,  where  ti  accounts  for  higher  order  error 
terms.  Then,  for  an  n-dimensioned  triangular  matrix  V[15:105]: 


||V->-U||  n2-»»||V||||V-^|| 

||V-i||  -  l-n2-‘.||V||||V-i|| 


This  error  bound  is  usually  very  pessimistic  also.  The  product  of  the  terms  ||V||  ||V“^||  is 
also  called  the  condition  number  of  the  matrix  with  respect  to  inversion.  Wilkinson  shows  that  it 
is  possible  for  triangular  matrices  to  have  very  large  condition  numbers,  yet  have  very  small  errors 
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in  calculating  the  inverse.  In  general,  the  inverse  of  a  triangular  matrix  has  an  error  such  that: 


Ej^J</M2-'||V-'||  (211) 

where  /(n)  is  a  simple  function  of  the  dimension  n  [15:106]. 

Strictly  speaking,  the  maximum  possible  error  in  computing  the  inverse  is  given  in  Equation 
(210).  However,  it  is  more  likely  that  the  errors  will  be  much  smaller,  as  given  in  Equation  (211). 
In  spite  of  these  error  bounds,  it  is  still  possible  for  some  elements  of  the  inverse  of  a  diagonal 
matrix  to  have  a  high  relative  error.  This  is  the  case  in  which  severe  cancellation  takes  place 
during  the  inversion.  Cancellation  of  significant  digits  in  the  inverse  of  the  V  matrix  is  the  same 
as  the  cancellation  of  significant  digits  which  occurs  when  a  path  coefficient  u,j  is  relatively  small 
in  magnitude  compared  to  the  regression  coefficients  used  to  calculate  it. 

5.3  Numeric  Examples 

The  following  examples  show  advantages  of  the  influence  diagram  and  some  of  the  problems 
with  numeric  deterioration  in  both  the  U-D  factored  form  of  the  Kalman  filter,  and  in  the  influence 
diagram.  The  first  example  is  an  application  of  the  influence  diagram  to  a  scalar  form  of  the 
Kalman  filter  update.  The  second  example  is  the  influence  diagram  solution  of  a  problem  proposed 
by  Bierman  [3:97].  It  demonstrates  some  of  the  conditions  for  error  in  the  influence  diagram.  This 
example  also  demonstrates  one  of  the  features  of  the  influence  diagram ;  it  shows  how  a  change  in 
the  order  of  operations  in  the  influence  diagram  may  avoid  the  numeric  error.  The  third  example 
shows  how  numeric  errors  can  occur  in  the  U-D  filter. 

5.3.1  Scalar  Update  Example.  The  first  example  is  a  demonstration  that  the  influence  dia¬ 
gram  is  equivalent  to  the  Kalman  filter  for  the  case  of  scalar  variables.  To  make  the  demonstration 
simpler,  let  x  and  z  be  two  zero-mean,  jointly  Gaussian  random  variables.  The  terms  x~  and  z~ 
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are  defined  as  in  the  Kalman  filter  example  of  Chapter  2,  and  x~  =  z~  =  0.  These  vectors  have 
variances  P~  and  R  respectively.  The  minus  superscript  indicates  that  these  variables  are  the  es¬ 
timates  prior  to  the  measurement  update.  The  Kalman  filter  calculates  the  conditional  mean  rmd 
variance  of  x  given  C  as  the  realization  of  z.  The  update  equations  for  these  scalar  variables  yields 


K 

P-H 

P-H^  +  R 

(212) 

P+ 

=  P--KHP- 

(213) 

x+ 

=  X-  +!<[(;-  Hi-] 

II 

(214) 

The  influence  diagram  for  the  same  conditions,  and  the  operations  for  the  update  are  shown 
in  Figure  46.  It  is  a  pair  of  nodes  labeled  x  and  z  with  variances  and  R  respectively.  The  arrow 
goes  from  a:  to  z  and  has  a  regression  coefficient  of  H.  The  variance  R  is  seen  to  be  the  conditional 
variance  of  z  given  x.  The  reversal  of  this  arrow  by  Bayes’  rule  yields  the  unconditional  variance  of 
z,  the  regression  coefficient  of  x  on  z  called  K,  and  the  conditional  variance  of  x  given  z  called  P'^ . 


A  =  P-H^  +  R 

(215) 

+  P-R  P-R 

A  p-m  +  R 

(216) 

p-H  P-H 

(21?) 

A  ~P-H^  +  R 

The  scalar  update  of  the  state  estimate  is 

£+  =  X"  -1-  K  (C  -  z] 

=  /CC  (218) 
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It  is  apparent  that  these  algorithms  both  yield  the  same  Kalman  gain  K,  conditional  mean 
and  conditional  variance  P'*'.  The  Kalman  gain  is  the  coefficient  on  the  arrow  z  to  z.  It  can 
be  shown  that  the  value  for  is  the  same  by  using  the  Kalman  filter  form  and  converting  it  as: 


P+  =  P--KHP- 

(219) 

p—  ZJ 

P+  =  P~  (  )HP~ 

(220) 

(P-P)2  +  P-P-(P-P)2 

P-H^  +  R 

(221) 

_  P-R 

P-H^  +  R 

(222) 

In  the  matrix  form  of  the  Kalman  filter,  the  simplifications  given  above  are  not  permitted.  In 
the  scalar  form  however,  the  conditional  variance  is  calculated  using  division,  not  subtraction.  This 
form  is  numerically  superior  as  R  goes  to  zero.  The  influence  diagram  calculates  the  conditional 
variance  of  a  random  variable  using  the  numerically  superior  scalar  operation.  It  calculates  the 
conditional  variance  of  a  random  vector  as  the  conditional  variances  of  a  series  of  scalar  random 
variables,  each  using  the  numerically  superior  scalar  equation. 
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5.S.S  Bierman’s  ExampU.  Bierman  proposed  a  problem  that  was  known  to  cause  numeric 
deterioration  in  the  Kalman  filter.  Assume  the  initial  estimates  of  ii  and  X2  are  zero  and  that 
P  =  cr*I  where  cr  =  1/e  is  large.  Assume  also  that  e  is  small  enough  that,  due  to  rounding,  1  +  e  ^  1 
but  1  +  e*  =  1.  There  are  two  observations  for  updating  the  mean  and  variance.  The  model  for  the 
observations  is: 


where  R  =  I.  Bierman  presented  the  results  for  the  conventional  Kalman  filter  using  scalar  updates, 
the  U-D  factorization  method,  the  Potter  covariance  square  root  method,  and  the  stabilized  Kalman 
filter.  The  exact  and  rounded  values  of  the  U-D  factorization  method  will  be  included  here  for 
comparison. 

Figure  47  shows  the  influence  diagram  implementation  of  this  example.  The  labels  on  the 
diagram  are  the  rounded  values,  with  the  exact  values  shown  in  a  table  on  the  diagram  itself.  In 
Figure  47,  the  exact  values  and  the  rounded  values  are  the  same. 

The  objective  of  the  influence  diagram  operations  will  be  to  move  the  zi ,  Z2  vector  to  the 
beginning  of  the  ordered  sequence.  One  way  is  to  move  zj  to  the  beginning,  then  move  Z2  next 
to  it.  Using  this  process,  the  first  operation  will  be  to  reverse  the  arrow  from  X2  to  zi.  Figure  48 
shows  the  results  of  this  reversal,  where  again,  the  rounded  values  are  the  same  as  the  exact.  There 
is  a  potential  problem  in  this  diagram  because  the  path  coefficient  from  xi  to  X2  is  still  zero,  but 
it  is  computed  with  regression  coefficients  that  have  large  relative  differences  in  magnitude.  There 
are  no  errors  in  rounding  yet,  but  they  are  imminent. 

The  second  operation  will  be  to  reverse  the  arrow  from  xj  to  zi  so  that  zj  will  be  unconditional 
as  shown  in  Figure  49.  The  rounding  errors  have  caused  cancellation  of  all  significant  digits  in  the 
path  coefficient  from  z\  to  X2.  It  is  now  incorrectly  calculated  as  l/2e-|-  (1)(— 1/2£)  =  0.  The  true 
value  should  be  l/2e-b  (p7iq:q^)(-l/2e),  a  small  number  calculated  by  the  difference  of  two  large 


122 


.±. 

1  1 

QUANTITY 

EXACT  VALUE 

Xi 

l/c* 

X2 

l/f2 

Zl 

1 

Z2 

1 

bxl,x2 

0 

^rl,zl 

1 

f>xl,z2 

1 

bx2,zl 

e 

I>x2,z2 

1 

bzl,z2 

0 

Figure  47.  Influence  Diagram  Formulation  of  Bierman’s  Example 
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QUANTITY 


EXACT  VALUE 


J-l 

lA" 

l/2c2 

2 

1 

Ki..2 

-l/2e 

1 

h\,z2 

1 

bzl,x2 

l/2c 

bx2,z2 

1 

bzl,z2 

0 

Figure  48.  Influence  Diagram  After  Exchanging  Nodes  and  zi 
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numbers.  This  error  will  be  carried  forward  to  all  further  operations  and  will  eventually  cause  the 
Kalman  gain  to  be  calculated  in  error. 

The  third  operation  will  be  to  reverse  the  arrow  from  X2  to  Z2,  while  the  final  operation  will 
be  to  reverse  the  arrow  from  ii  to  Z2.  These  operations  are  shown  in  Figure  PO  and  Figure  51 
respectively. 

The  path  coefficients  from  the  vector  z  to  the  vector  x  are  equivalent  to  a  Kalman  gain 
matrix.  The  exact  values  of  these  path  coefficients  are  essentially  the  same  as  the  Kalman  gains 
computed  by  Bierman,  with  the  exception  of  The  influence  diagram  incorrectly  calculates 

this  number  as  zero.  These  values  are  shown  in  Table  7. 

This  example  shows  the  types  of  rounding  errors  that  can  occur  in  the  influence  diagram. 
Specifically,  the  error  occurred  because  of  cancellation  of  all  significant  digits  in  the  computation 
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-1 


l/2e 


QUANTITY  EXACT  VALUE 


*2 

4 


2/(2e2  + 1) 
l/(2e2  + 1) 
2  +  l/e^ 

1  +  l/2e2 


^xl,e2 

Ki,z2 
Ki,x2 
^t2,x2 
6',  „ 


-(£  +  l)/(2eHl) 
1/(2€2  +  1) 

1  -  l/2c 
€/(2£2  +  1) 
l/(2c2  +  l) 

1/2£ 


Figure  50.  Influence  Diagram  After  Exchanging  Nodes  X2  and  Z2 
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1  +  ^  1 


(  ^2  ) - 

V_y  -e 

l/c^  (1  -  2£)/e2 

1/(1 -2e)  1 

QUANTITY 

EXACT  VALUE 

-r" 

(2e2  +  l)/(l-2£  +  4£2  +  20 

®2 

1/(2£2  +  1) 

4 

2  +  l/£® 

4 

(l-2£  +  4c2  +  2£‘')/(f^  +  2f4) 

4i,x2 

-(e  +  l)/(2£2  +  l) 

(£2-e  +  l)/(l-5!£  +  4£2  +  2£‘‘) 

(2e2-£)/(l-2£-t-  4£='  +  2€^) 

b:l,x2 

£/(2£2  +  1) 

f>z2,x2 

1/{2£2  +  1) 

f>zl,i2 

(£+1)/(2£2+1) 

Figure  51.  Influence  Diagram  After  Exchanging  Nodes  xi  and  Z2 


127 


Table  7.  Comparison  of  Computed  Path  Coefficients  (Kalman  Gains) 


Quantity 

Exact  Value 

U-D  Influence  Diagram 

Wil.arl 

1/(1  +  26=^) 

1  1 

e/(l  +  2e2) 

£  0 

W*2,rl 

i2e^-€)/{l-2e  +  A€^  +  2€^) 

— e  -£ 

^z2,x2 

{l-e)/il-2€  +  Ac'+2e^) 

(1  -  £)/(i  -  2f)  1 -1- e  l-fe 

of  one  of  the  regression  coefficients.  It  also  shows  that  an  update  using  zi  may  be  inaccurate,  but 
an  update  at  Z2  may  be  large  enough  to  cause  the  relative  error  in  to  be  small. 

There  are  actually  two  equations  in  the  influence  diagram  algorithm  that  can  cause  cancella¬ 
tion  of  significant  digits.  One  is  the  equation  =  hkj  -f  used  for  calculating  the  regression 
coefficient  from  predecessor  nodes  to  the  node  being  made  less  conditional.  The  other  equation 
is  b'^.^  =  bki  -  used  for  calculating  the  regression  coefficient  from  predecessor  nodes  to  the 

node  being  made  more  conditional.  Errors  caused  by  the  first  equation  were  shown  in  the  previous 
example.  The  second  equation  can  result  in  large  relative  error  as  well,  but  its  overall  effect  will 
be  less.  This  is  because  the  new  regression  coefficient,  no  matter  how  inaccurate,  is  only  one  of 
at  least  two  (probably  more)  regression  coefficients  leading  to  that  node  as  it  becomes  conditioned 
on  more  of  the  other  nodes.  The  effect  of  these  other  predecessor  coefficients  tend  to  reduce  the 
relative  effect  of  the  coefficient  in  error. 

Although  this  example  demonstrates  rounding  errors  in  the  influence  diagram,  it  also  shows 
the  capability  of  the  influence  diagram  to  avoid  such  problems.  If  the  order  of  node  reversal 
is  changed,  then  the  problem  of  canceling  significant  digits  during  arrow  reversal  is  minimized. 
However,  to  the  author’s  knowledge,  there  are  no  guidelines  as  to  the  “best”  order  for  an  influence 
diagram  for  avoiding  numeric  difficulties. 

The  original  order  of  the  nodes  ii,  X2,  and  zi  is  shown  in  Figure  52  along  with  an  ^.ueinative 
ordering  If  the  second  ordering  is  used,  the  problem  encountered  in  the  previous  example  does 
not  occur.  Instead  of  proceeding  through  the  entire  example  again,  Figure  53  shows  the  results 
after  exchanging  to  the  point  that  node  z"  is  unconditional.  Again,  the  diagram  uses  rounded 
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Figure  52.  Two  Different  Orderings  of  the  First  Three  Nodes  in  Bierman’s  Example 

values  but  the  exact  values  are  included  for  comparison.  The  rounded  value  of  the  path  coefficient 
from  z'{  to  Xj 's  exactly  the  same  as  the  U-D  filter.  The  rest  of  the  calculations  for  making  22 
unconditional  are  not  shown  here  because  the  error  in  the  earlier  calculation  was  the  cause  of  the 
problems  at  the  end.  There  are  no  more  significant  rounding  errors  in  the  rest  of  the  problem. 
The  important  conclusion  is  that,  if  significant  cancellation  of  significant  digits  occurs,  it  may  be 
possible  to  reorder  the  sequence  of  node  reversals  and  improve  the  numeric  results. 
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It  may  also  be  possible  to  choose  an  order  for  the  vector  x  during  problem  setup  that  is 
optimum  for  numeric  results.  Although  possible,  this  is  not  yet  a  satisfactory  approach  either. 
The  numeric  properties  of  the  influence  diagram  are  not  well  understood.  Except  for  this  research 
paper,  there  is  little  insight  into  what  circumstances  yield  poor  numeric  results.  This  is  one  of  the 
key  reasons  that  further  research  is  needed  on  the  numeric  properties  of  the  influence  diagram.  The 
need  for  more  research  is  reflected  in  the  recommendations  at  the  end  of  this  pape, 

5.S.S  Potenital  Errors  in  the  U-D  Filter.  The  U-D  factored  form  of  the  filter  can  also  have 
cancellation  of  significant  digits  to  the  point  where  all  remaining  significant  digits  are  in  error.  One 
set  of  circumstances  that  causes  such  cancellation  of  digits  is  when  the  off-diagonal  terms  of  the 
U  matrix  are  reduced  by  a  measurement  update.  The  algorithm  for  computing  the  off-diagonal 
terms  of  the  U  matrix  requires  subtraction  just  as  the  influence  diagram  algorithm  for  recomputing 
regression  coefficients  from  predecessor  nodes. 

For  example,  modify  Bierman’s  example  in  the  previous  subsection  so  that  the  measurement 
model  is 
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This  measurement  matrix  creates  the  circumstances  described  earlier  for  cancellation  of  significant 
digits  in  the  U-D  filter.  The  scalar  update  with  the  first  row  of  H  results  in  the  (1,2)  term  of  the 
U  matrix  to  be  —1 /2c,  a  very  large  term.  It  is  normal  for  the  off-diagonal  terms  of  the  U  matrix 
to  be  nonzero,  but  in  this  example,  the  situation  was  created  purposefully. 

When  the  update  is  made  using  the  second  row  of  H,  the  off-diagonal  term  of  U  decreases.  In 
this  case,  the  subtraction  results  in  the  cancellation  of  all  significant  digits,  and  the  new  U12  term 
is  incorrectly  calculated  as  zero.  The  calculations  just  described  for  the  U-D  filter  arc  as  follows. 
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Let  the  initial  conditions  be: 


<72  0 

1  0 

D  = 

u  = 

b 

0 

_ 1 

0  1 

(225) 


For  the  first  scalar  update,  use  H  =  [e  1]. 

/  =  U^H^/i  =  e,/2  =  l 
=  1/e,  V2  =  l/e^ 
aa  =  R-=  I 

for  k=l 

ai  =  ao  +  f\Vi  =  2 
■^n  =  =  l/2e2 

hx  =  1/e 

for  k=2,  j=l 

02  =  ai  +  /2V2  =  2  +  1/e^  «  1/e^ 

D22  —  -022®!  /<l2  =  2 
b2  =  V2  =  1/e^ 

P2  =  -/2/ai  =  -1/2 
f/i+2  =  C/fj  +  61P2  =  -l/2e 
il  =  61  +  f/i'^U2  =  l/€ 


Repeat  for  the  next  scalar  update,  use  H  =  (1  0]. 

/  =  U^H^, /i  =  l,/2  =  -l/2e 
vi  =  l/2e^,  V2  =  -1/e 
oo  =  ii  =  1 


for  k=l 

ax  =  ao  +  fivi  =  1  -i-  l/2e^  w  l/2e2 
■^n  =  -DfiOo/ai  =  1 
=  l/2e2 


for  k=2,  j=l 

02  =  ax  +  /2V2  =  l/2e^  +  l/2e^  =  1/e^ 
•0^2  —  -^22®!  7*^2  =  1 
62  =  V2  =  -1/c 
P2  =  -/2/ai  =  e 

[/+  =  t/-  +  61P2  =  -l/2e  +  l/2e  =  0 
bi  =  bi  +  [/{^V2  =  l/2e^  +  l/2e^  =  1/e^ 
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In  this  example,  the  U^2  incorrectly  calculated  as  zero.  The  cancellation  of  significant 

digits  is  exactly  the  same  kind  of  error  demonstrated  in  the  previous  example  using  the  influence 
diagram.  Since  both  the  influence  diagram  and  the  U-D  filter  implementations  are  factored  forms 
of  the  covariance  matrix,  it  is  reasonable  to  expect  the  errors  to  be  similar. 

5.3.4  Numeric  Analysis  Generalizations.  Some  generalizations  can  be  made  about  some  of 
the  situations  that  cause  numeric  problems  in  both  the  U-D  filter  and  the  influence  diagram.  These 
generalizations  are  based  on  simple  examples  such  as  demonstrated  earlier,  and  on  a  very  limited 
number  of  experiments,  using  commercially  available  computer  software.  For  both  algorithms, 
these  generalizations  do  not  cover  all  of  the  possible  conditions  for  numeric  errors. 

In  an  influence  diagram,  if  only  one  state  variable  is  affected  by  a  measurement  (H  has  only 
one  nonzero  element  in  a  row),  then  the  best  numeric  results  occur  when  that  state  variable  is  the 
first  one.  Changing  the  unconditional  variance  of  the  first  node  in  the  ordered  sequence  is  sufficient 
to  account  for  the  decreased  variance  due  to  the  update.  No  other  calculations  are  necessary,  and 
there  is  less  likelihood  of  numerical  errors. 

In  matrix  terms,  let  H  be  a  row  vector  with  the  first  element  nonzero,  and  all  other  elements 
zero.  Let  this  H  matrix  represent  the  measurement  model  for  a  Kalman  filter  update.  The  U^DU 
factorization  of  the  covariance  matrix  and  the  updated  version  of  that  covariance  matrix  will  differ 
in  only  the  first  term  of  the  diagonal  matrix.  There  will  be  no  change  in  the  U  matrix  and  there 
will  be  low  relative  error  in  computing  the  updated  covariance.  Therefore,  a  good  way  to  avoid 
cancellation  of  significant  digits  during  update  is  to  order  the  variables  in  the  state  estimate  such 
that  the  updated  variables  are  first. 

By  analogy,  the  U-D  filter  has  the  opposite  problem.  When  the  first  variable  is  updated,  as 
was  done  in  the  modified  version  of  Bierman’s  example,  the  U  matrix  terms  were  significantly  in 
error.  If  the  ordering  of  the  updated  variables  were  reversed  such  that  H  =  [0  1],  then  no  change 
would  have  occurred  to  the  U  matrix.  To  minimize  cancellation  of  significant  digits  during  update 
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in  the  U-D  factored  form  of  the  filter,  the  variables  in  the  state  estimate  should  be  ordered  such 
that  the  updated  variables  are  last. 

5.4  Chapter  Conclusions 

This  chapter  used  the  arithmetic  properties  of  the  influence  diagram  and  their  relationship 
with  matrix  operations  to  analyze  the  numeric  properties  of  the  influence  diagram  implementation 
of  the  discrete-time  filter.  It  was  shown  that  the  influence  diagram  computes  the  conditional 
covariance  of  a  random  vector  as  a  series  of  scalar  operations.  This  resulted  in  better  numeric 
properties  than  the  Kalman  filter  conditional  covariance  matrix  equation.  It  was  also  shown  that 
the  influence  diagram  uses  a  stable  algorithm  to  reverse  the  node  order  and  calculate  conditional 
variances. 

It  is  difficult  to  put  a  strict  bound  on  the  errors  that  might  be  caused  by  the  influence  diagram 
algorithm.  Even  though  the  factored  forms  of  the  matrices  may  have  bounded  errors,  there  is  no 
guarantee  that  all  elements  of  the  matrix  will  have  a  bounded  relative  error.  However,  because  of 
the  triangular  form  of  the  matrices,  the  errors  will  usually  be  small. 

Finally,  examples  were  purposefully  constructed  that  showed  the  worst  case  errors  that  might 
occur  in  both  the  If-D  filter  and  the  influence  diagram  discrete-time  filter.  It  was  shown  that  the 
conditions  for  error  in  both  filter  implementations  are  similar.  Even  though  experimental  evidence 
is  limited,  the  initial  indications  are  that  the  U-D  filter  and  the  influence  diagram  have  almost 
identical  error  properties. 
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VI.  Summary 


6.1  Conclusions 

The  purpose  of  this  research  was  to  evaluate  the  influence  diagram  as  an  alternative  method 
for  the  discrete-time  filter.  Kenley’s  doctoral  dissertation  [6]  laid  the  groundwork  by  proving 
the  influence  diagram  could  be  used  for  jointly  Gaussian  random  variables.  He  also  proved  the 
basic  matrix  relationships  and  demonstrated  the  influence  diagram  for  discrete-time  filtering.  This 
research  built  on  Kenley’s  original  work.  Some  of  the  more  important  results  from  this  thesis  are 
summarized  here. 

The  influence  diagram  implements  a  factored  form  of  the  Kalman  filter  just  as  does  the  U- 
D  filter.  These  two  implementations  are  essentially  mirror  images.  One  is  related  to  the  lower 
triangular  version  of  the  Cholesky  decomposition,  and  the  other  is  related  to  the  upper  triangular 
version.  Because  of  this  similarity,  it  could  be  expected  that  the  efficiency  and  numerical  properties 
of  the  two  algorithms  are  very  similar. 

The  influence  diagram  implementation  was  shown  to  have  some  advantages  over  the  U-D 
filter.  Specifically,  it  can  be  more  efficient  in  terms  of  computational  loading.  It  also  lends  itself  to 
parallel  processing  architectures  with  a  resulting  reduction  in  processing  time. 

Based  on  theory  and  limited  experimentation,  the  influence  diagram  probably  has  numeric 
properties  equivalent  to  those  of  the  U-D  filter.  It  appears  that  there  are  conditions  under  which 
either  one  may  be  better  than  the  other,  but  it  is  unlikely  that  either  is  inherently  better  or  worse 
than  the  other.  The  one  advantage  of  the  influence  diagram  is  that  almost  any  numeric  error  can 
be  traced  to  one  equation. 

Perhaps  the  most  important  advantage  is  intangible.  The  influence  diagram  is  a  graphic 
tool  that  gives  the  user  tremendous  insight  into  the  meaning  of  the  numerical  operations.  The 
calculations  are  much  easier  to  understand  than  the  comparable  calculations  for  the  U-D  fiUer. 
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This  is  because  the  numeric  values  have  physical  meaning  as  conditional  variances  and  regression 
coefficients. 

6.2  Other  {pplicaiions 

This  research  only  described  the  characteristics  of  the  influence  diagram  when  applied  to 
discret.-time  filtering.  There  are  potential  uses  in  fault  detection  and  hypothesis  testing,  in  areas 
where  the  inverse  covariance  (information)  matrix  is  better  suited,  in  optimal  smoothing,  optimal 
control,  and  parameter  estimation.  Furthermore,  the  discrete  probability  version  of  the  influence 
diagram  can  be  used  for  probabilistic  analysis  of  Markov  processes. 

6.2.1  Fault  Detection/ Hypothesis  Testing.  Assume  that  measurements  are  in  the  form  of 
scalar  updates  as  discussed  earlier  in  this  thesis.  As  each  node  of  z{ti)  is  moved  to  the  beginning 
of  the  ordered  sequence,  the  remaining  nodes  of  z(<,)  are  conditioned  on  it.  As  seen  previously, 
the  variance  of  a  conditioned  node  is  smaller  than  the  unconditional  variance  of  the  same  node. 
It  is  this  smaller  variance  which  is  useful  for  failure  detection.  This  method  is  the  similar  to  the 
Kalman  filter  approach  which  uses  H(<,)P(t<)H^(t,)  +  R(tj)  as  the  predicted  variance  matrix  of 
the  variables  of  z(t,)  [7:230].  The  influence  diagram  incorporates  each  scalar  update  to  make  the 
variance  smaller  on  succeeding  measurements. 

This  property  has  potential  use  in  hypothesis  testing  and  fault  detection.  It  would  be  best  to 
incorporate  the  first  measurements  from  sensors  that  are  reliable,  or  that  are  measuring  parameters 
which  are  not  part  of  the  hypothesis.  The  later  mea.surements  would  come  from  sensors  that  are 
more  likely  to  fail,  or  that  are  measuring  parameters  that  are  part  of  a  hypothesis.  In  both  cases, 
the  tighter  bounds  of  the  conditional  variance  gives  later  measurements  more  discriminating  power. 

6.2.2  Inverse  Covariance.  The  influence  diagram  is  based  on  determining  the  conditional 
mean  and  variance  of  Gaussian  random  variables.  It  was  shown  earlier  that  this  conditional  variance 
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has  a  direct  correspondence  to  the  inverse  covariance  matrix.  Instead  of  using  the  conditional 
variance  to  describe  each  variable,  the  “information  level”,  taken  to  be  the  inverse  of  the  variance, 
could  be  used  [7;pp.  238-241],  The  influence  diagram  algorithm  would  be  modified  to  use  the 
inverse  of  the  variance.  The  most  conditional  node  in  the  influence  diagram  (the  last  one)  would 
be  considered  as  having  the  most  information  from  predecessors.  The  information  level  of  this  last 
node  would  be  identical  to  the  corresponding  diagonal  term  of  the  inverse  covariance  matrix. 

One  application  of  such  an  approach  is  optimal  smoothing.  The  influence  diagram  could  be 
used  to  determine  the  optimal  estimate  based  on  all  previous  measurements.  The  inverse  covariance 
form  could  be  used  to  incorporate  information  from  later  measurements.  For  the  influence  diagram, 
this  becomes  nothing  more  than  reversing  all  arrows  from  later  measurements  to  point  to  the  desired 
estimate.  The  desired  state  estimate  then  becomes  conditioned  on  the  later  measurements.  The 
combination  of  the  two  becomes  the  optimal  estimate  of  the  states  based  on  all  measurements,  both 
previous  and  later. 

6.S.S  Optimal  Control.  The  influence  diagram  came  from  the  field  of  decision  analysis.  The 
Gaussian  influence  diagram  also  allowed  decisions  based  on  linear,  quadratic  cost  functions  of 
appropriate  variables.  This  aspect  of  the  influence  diagram  was  described  in  detail  in  Kenley’s 
original  work,  but  has  not  been  addressed  in  this  thesis.  It  is  obvious  that  decisions  based  on  linear 
system  models  and  quadratic  costs  on  jointly  Gaussian  random  variables  is  equivalent  to  LQG 
control.  It  remains  to  be  seen  whether  the  influence  diagram  offers  efficient  implementation  of  such 
control  inputs. 

Another  approach  to  optimal  control  relies  on  the  dual  nature  of  the  optimal  full-state  feed¬ 
back  controller  and  the  Kalman  filter.  Because  the  influence  diagram  implements  a  factored  form  of 
the  Kalman  filter,  it  could  also  be  used  to  implement  a  factored  form  of  an  LQG  optimal  controller. 
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6. 8-4  Parameter  Estimation.  Throughout  this  thesis,  the  coefficients  on  the  arrows  between 
nodes  have  been  called  regression  coefficients.  In  the  Kalman  filter,  it  is  assumed  that  these 
coefficients  are  known  elements  of  either  the  state  transition  matrix  or  the  measurement  matrix. 
As  such,  these  regression  coefficients  are  not  random  variables. 

In  practice,  it  may  be  true  that  one  or  more  of  these  coefficients  is  unknown  or  uncertain. 
Under  linear  system  model  assumptions,  using  known  inputs  and  measured  outputs,  the  value 
of  such  unknown  or  uncertain  parameters  may  be  estimated.  If  such  analysis  uses  conventional 
least  squares  techniques,  then  determining  the  j>arameters  is  identical  to  determining  regression 
coefficients.  Such  insights  would  be  useful  in  system  identification. 

6.S.5  Discrete  Probability  Applications.  In  the  Gaussian  random  variable  influence  diagram, 
it  was  the  Markov  nature  of  the  random  variable  that  permitted  a  complete  description  of  the 
density  function  at  a  given  time,  based  on  the  density  function  at  a  previous  time.  The  same 
is  true  of  a  Markov  process  with  discrete  probability  distributions.  If  the  nodes  of  the  discrete¬ 
time  filter  described  in  this  thesis  are  replaced  with  discrete- time,  discrete-probability-distribution 
random  variables,  then  they  become  the  description  of  a  Markov  process.  Observations  of  such  a 
process  can  be  probabilistic  as  well.  Although  the  mathematics  are  not  as  easy  as  for  the  simple 
Gaussian  case,  such  an  approach  would  be  a  Bayesian  method  of  estimating  the  states  of  a  Markov 
process,  based  on  uncertain  observations. 

6.S  Recommendations 

There  are  several  areas  left  unexplored  in  this  research.  Probably  the  most  useful  research 
would  be  a  definitive  study  of  the  conditions  for  best  and  worst  numeric  performance.  Such  a  study 
would  probably  require  both  implementations  running  on  one  computer.  The  computer  should  also 
be  capable  of  varying  the  number  of  significant  bits  used  internally  for  storage  and  calculations. 
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Another  useful  study  would  be  an  attempt  to  decrease  the  numeric  errors  of  the  influence 
diagram.  The  type  of  errors  that  occur  in  the  influence  diagram  are  similar  to  the  type  of  errors 
that  occur  in  the  U-D  filter.  However,  as  shown  earlier,  the  order  of  the  operations  may  affect  the 
accuracy  of  the  influence  diagram  operations.  An  example  of  a  way  to  minimize  errors  is  to  identify 
them  as  they  oc^ur.  In  a  digital  computer,  this  could  be  done  by  monitoring  the  two  equations 
responsible  for  cancellation  of  significant  digits.  If  either  of  them  results  in  a  significant  decrease 
in  value  (e.g.  <  |6tj71000l),  then  the  current  calculations  would  be  halted,  and  the  algorithm 

would  proceed  using  a  different  order  for  further  reversals.  Such  a  computer  program  would  also 
need  to  identify  those  situations  where  the  regression  coefficient  is  unavoidably  small,  or  where  it 
may  be  intentionally  zero. 

It  may  be  useful  to  analyze  the  matrix  operations  for  the  influence  diagram  in  more  detail.  It 
may  be  that  rows  and  columns  of  the  factored  covariance  matrix  may  be  reordered  more  efficiently 
than  the  “two  at  a  time”  method  of  the  influence  diagram.  If  that  were  that  case,  then  whole 
blocks  of  rows  could  be  reordered.  Such  an  operation  would  be  much  more  efficient  than  the 
current  method. 

As  mentioned  earlier,  there  is  a  relationship  between  decision  and  control  theory.  The  in¬ 
fluence  diagram  can  be  used  to  make  decisions,  based  on  jointly  Gaussian  random  variables  and 
linear  system  models.  These  conditions  are  also  the  the  assumptions  needed  for  LQG  control.  The 
two  methods  should  be  equivalent.  Even  though  this  seems  reasonable,  this  equivedence  1  as  not 
yet  been  proven.  It  would  be  very  useful  to  compare  the  influence  diagram  and  LQG  control. 
Such  a  comparison  would  need  to  prove  the  relationship  of  the  two  methods,  and  to  evaluate  their 
efficiency  and  numerical  properties. 
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Appendix  A.  Appendix  A:  Operations  Count 


This  appendix  is  an  explanation  of  the  method  used  to  calculate  the  number  of  operations 
required  to  implement  the  influence  diagram  algorithm  for  discrete  time  filtering.  Assume  an 
influence  diagram  of  the  form  of  Figure  42  in  Chapter  3.  The  new  regression  coefficients  between 
x(<j_i)  and  x(t,)  must  be  modified  using 


(226) 


where  B,  is  the  influence  diagram  B  matrix  corresponding  to  »ie  influence  aiagram  factorization 
of  and  I  is  the  identity  matrix  of  appropriate  dimension.  The  term 

(I-B,)  requires  no  operations  because  the  diagonal  terms  of  Bj  are  zero. 

Because  the  matrix  (I  —  Bj)^  is  a  lower  triangular  unit  matrix,  only  the  nonzero  additions 
and  the  non-unity  multiplications  need  to  be  counted.  For  n-dimensional  matrices,  the  number  of 
multiplications  and  additions  is: 


n(n  —  1) 
2 


(227) 


After  computing  the  new  regression  coefficients  of  #g,  the  influence  diagram  of  Figure  43  can  be 
drawn. 


Now  determine  the  number  of  operations  needed  to  remove  a  node.  Assume  a  node  in  the 
middle  of  an  ordered  sequence  is  to  be  moved  to  the  end  of  the  ordered  sequence  and  removed.  If  i 
is  the  predecessor  node  and  j  is  the  successor  node,  then  the  equation  for  calculating  the  variance 
of  the  new  predecessor  after  reversal  is: 


=  Vj+bJ^Vi 

This  equation  requires  3  multiplications  and  1  addition. 


(228) 
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Assume  that  both  nodes  have  the  set  of  predecessors  K,  where  the  elements  of  K  are  des¬ 
ignated  k.  The  regression  coefficients  from  each  node  k  to  the  new  predecessor  are  calculated 
using: 


^'kj  =  ^kj  +  Hi^ij  (229) 

This  equation  requires  1  multiplication  and  1  addition. 

For  the  new  successor,  the  equation  for  calculating  the  new  variance  is  modified  slightly  to 
increase  the  efficiency.  The  new  successor’s  variance  is  calculated  using: 

_ 

^ratio  —  i 

~  Vj  Vr.atio 

The  new  regression  coefficient  between  the  two  nodes  is  calculated  by: 

f>ji  —  bijVratio 

These  equations  require  1  division  and  2  multiplications. 

For  the  same  set  of  predecessors  K,  the  equation  for  modifying  the  regression  coefficients  to 
the  new  successor  is: 


(232) 


(230) 

(231) 


b'ki  =  (233) 

This  equation  requires  1  multiplication  and  1  addition. 

When  a  node  is  moved  to  the  end  of  an  ordered  sequence  and  removed,  then  it  must  be 
reversed  with  each  successor  node  in  the  sequence.  After  the  last  reversal,  the  subject  node  becomes 
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a  nuisance  node  and  is  removed  from  the  diagram.  There  is  no  need  to  calculate  its  new  conditional 
variance  or  the  values  of  the  regression  coefficients  for  it.  This  implies  that,  during  the  last  reversal, 
only  Equation  (228)  amd  Equation  (229)  must  be  calculated. 

Assume  now  that  the  node  to  be  removed  has  n  successors  and  m  predi  '.essors.  Removal  of 
the  node  requires  n-1  reversals  using  Equations  (228)  through  (233),  and  one  removal  using  only 
Equation  (228)  and  Equation  (229). 

In  order  to  make  the  calculations  simpler,  calculate  the  number  of  operations  due  to  Equation 
(229)  and  Equation(233)  later.  The  number  of  operations  to  to  remove  one  node  by  making  n-1 
reversals  and  one  removal,  using  the  remaining  equations,  is: 

multiplications:  5n  -  2 
additions:  n 

divisions:  n-1 

Now  calculate  the  number  of  operations  due  to  Equation  (233).  The  node  being  removed  and 
the  node  with  which  it  is  exchanged  both  have  the  same  number  of  predecessors.  At  first,  there  are 
m  predecessors,  then  m  -f  1,  m  -h  2,  and  so  on  until  the  last  exchange  Las  m  -1-  n  —  1  predecessors. 
Therefore,  the  number  of  times  Equation  (233)  will  be  used  is 

^(m  +  r)  =  mn  +  (234) 

r=0 

There  is  one  multiplication  and  one  addition  due  to  Equation  (233)  each  time  it  is  used.  Equation 
(229)  is  used  m  +  n  —  1  times  less  than  Equation  (233)  because  the  regression  coefficients  are  not 
calculated  after  the  last  reversal.  Together,  the  two  equations  require  2(mn+n(n-l)/2)-(m-t-n-l) 
multiplications  and  additions  to  remove  one  node. 
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Now  assume  that  there  are  n  nodes  to  be  removed,  as  in  Figure  43.  Instead  of  m  predecessors, 
the  first  node  removed  will  begin  with  n  -  1  predecessors.  The  second  node  will  begin  with  n  —  2 
predecessors.  The  last  node  will  have  no  predecessors.  The  number  of  operations  to  remove  n 
nodes  then  becomes: 

multiplications:  (5n  —  2)n  +  Ylm=n-i  2(mn  +  n(n  —  l)/2)  -  (m  +  n  —  1) 
additions:  2(mn  +  n{n  -  l)/2)  -  (m  +  n  -  1) 

divisions:  (n  —  l)n 

The  result  of  these  operations  is  the  second  influence  diagram  in  Figure  43. 

The  next  operation  is  to  condition  the  nodes  of  x{ti)  on  the  nodes  of  z(U).  This  was  depicted 
in  Figures  16  and  17.  No  operations  are  required  to  move  directly  from  Figure  16  to  the  first 
diagram  of  Figure  17  because  R(<,)  is  assumed  to  be  diagonal.  The  variances  of  the  nodes  of  z(t,) 
in  Figure  17  are  identical  to  the  variances  of  the  nodes  of  v(<<)  in  Figure  16,  which  were  also  the 
diagonal  terms  of  the  R(<,)  matrix. 

If  there  are  p  nodes  in  the  measurement,  then  they  must  be  moved  so  that  they  are  at  the 
beginning  of  the  ordered  sequence.  These  operations  are  depicted  in  Figure  17.  Again,  calculate 
the  number  of  operations  due  to  Equation  (228),  Equation  (231),  and  Equation  (232)  first,  and 
calculate  the  operations  due  to  predecessor  nodes  later.  Each  node  of  z{ti)  must  be  moved  past 
the  n  nodes  of  x{ii).  This  time,  no  nuisance  nodes  will  be  removed.  Moving  one  node  requires: 

multiplications:  5n 
additions:  n 

divisions:  n 

Now  calculate  the  operations  due  to  Equation  (229)  and  Equation  (233).  When  the  first  node 
of  z{tt)  is  moved  to  unconditional  position,  then  number  of  applications  of  each  of  these  equations 
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is: 


0 

E  (235) 

m=n-l 

The  second  node  of  z(<,)  starts  off  with  one  more  predecessor  than  the  first  node.  The  number  of 
operations  required  to  move  it  up  n  positions  is: 

0 

E  +  l  (236) 

m=n-l 

Each  node  of  z(t,)  is  moved  up  n  positions  similarly.  Because  there  are  p  of  them,  then  the  total 
number  of  applications  of  Equations  (229)  and  (233)  are: 

E(  E  '”  +  9)  (237) 

f=0  m=n— 1 

This  double  sum  adds  to  np(n  +  p  -  2)/2  applications  of  both  Equation  (229)  and  Equation  (233). 
The  total  number  of  operations  to  move  a  vector  of  p  nodes  to  the  beginning  of  the  ordered  sequence 
is: 

multiplications:  np(n  +  p  —  2)  +  5np 
additions:  np(n  +  p  —  2)  +  np 

divisions:  np 

The  means  are  oropagated  as  described  in  Chapter  3.  The  vector  update  algorithm  uses  p 
additions  to  calculate  the  initial  p  residuals.  The  number  of  additions  or  multiplications  to  calculate 
the  sum  of  the  series  of  inner  products  and  the  new  conditional  means  is: 

n 

Ep  +  J-1  (238) 

;=i 

This  sum  reduces  to  n(n  +  2p  —  l)/2  as  given  in  Chapter  3. 
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The  Kalman  filter  update  equation  given  in  Equation  (108)  and  the  measurement  prediction 
of  Equation  (109)  require  a  total  of  n(n  +  p)  multiplications  and  (n-  l)(n+p)  additions.  The  total 
number  of  operations  needed  to  update  the  conditional  means  is  the  sum  of  all  these  operations 
counts.  They  are  tabulated  as: 

multiplications:  n(n+2p-i)  ^ 

additions:  4.  (n  —  l)(n  +  p)  +  p 

The  total  number  of  operations  is  the  sum  of  all  appropriate  equations.  These  sums  are: 
multiplications:  2ti^  +  n^(p  -  0.5)  +  n(p^  +  P  —  0.5) 

additions:  2n®  +  n^(p  +  3.5)  +  n(p^  +  5p  —  1.5) 
divisions:  n(n  +  p— 1) 


145 


Appendix  B.  Appendix  B:  Operations  Count,  Parallel  Processing 


This  appendix  is  an  explanation  of  the  method  used  to  calculate  the  number  of  operations 
required  to  implement  the  influence  diagram  algorithm  for  discrete  time  filtering,  assuming  parallel 
processing.  Assume  that  n  nodes  will  be  moved  past  another  group  of  n  nodes  and  removed.  Such 
was  the  case  shown  in  Figure  44  in  which  n  =  3.  The  equations  are  the  same  as  those  given  in 
Appendix  A. 

Begin  with  the  assumption  that  there  is  a  single  processor  dedicated  to  the  task  of  calculating 
all  necessary  equations  for  each  of  the  n  nodes  being  moved  rightwards.  For  example,  in  Figure 
44,  there  would  be  three  processors,  one  assigned  to  each  of  the  first  three  nodes.  Between  the 
first  and  second  influence  diagrams,  a  single  processor  would  reverse  nodes  3  and  4.  Between  the 
second  and  third  diagrams,  one  processor  would  be  used  to  exchange  nodes  3  and  5,  while  another 
would  be  used  to  exchange  2  and  4.  Moving  from  the  third  to  the  fourth  diagram  requires  all  three 
processors;  one  processor  removes  node  3,  another  reverses  nodes  2  and  5,  while  the  third  reverses 
nodes  1  and  4.  Moving  to  the  fifth  diagram  requires  only  two  processors,  one  to  remove  node  2  and 
another  to  reverse  nodes  1  and  5.  Finally,  only  a  single  processor  is  needed  to  remove  node  1  and 
result  in  the  last  influence  diagram. 

During  each  time  interval,  one  processor  will  take  longer  than  the  other  processors  to  com¬ 
plete  all  of  its  calculations.  The  processor  needing  the  most  time  will  be  the  one  with  the  most 
operations  to  complete.  In  the  example  of  Figure  44,  the  processor  assigned  to  node  3  will  have 
more  calculations  during  the  first  two  time  intervals  because  it  has  more  predecessor  regression 
coefficients.  During  the  next  time  interval,  node  2’s  processor  has  the  most  calculations  because 
node  3  is  being  removed  and  its  processor  has  less  equations.  Similarly,  node  I’s  processor  has  the 
most  calculations  during  the  next  time  interval.  During  the  last  time  interval,  only  one  processor 
is  operating. 
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The  number  of  operations  required  to  update  the  conditional  means  and  to  calculate  = 
(I #(<,■,  ti-i)  were  given  in  Appendix  A.  Because  of  this,  in  the  following  computations, 
only  the  operations  needed  to  manipulate  the  influence  diagram  will  be  counted.  Also,  the  only 
operations  counted  at  each  time  interval  will  be  those  of  the  processor  taking  the  longest  time. 

As  before,  in  Appendix  A,  calculate  the  number  of  operation  due  to  Equations  (228),  (231), 
and  (232)  first.  Calculate  the  number  of  operations  due  to  Equations  (229)  and  (233)  later.  At  each 
time  interval  except  the  last,  there  is  at  least  one  processor  reversing  a  pair  of  nodes.  Equations 
(228),  (231),  and  (232)  require  5  multiplications,  1  additions,  and  1  division  as  a  minimum  in 
each  time  interval.  There  are  (n  -  1)  +  (n  —  1)  such  reversals  as  a  vector  of  n  nodes  is  removed. 
Additionally,  the  last  time  interval  only  removes  one  node  and  requires  require  3  multiplications 
and  1  addition. 

The  next  step  is  to  calculate  the  number  of  predecessors  for  the  node  with  the  most  operations. 
In  every  case,  the  node  with  the  most  operations  is  the  node  being  reversed  (not  removed)  that  is 
furthest  to  the  right.  During  the  first  n  -  1  successive  time  intervals,  this  node  has  n  —  1,  n,  n  + 

1 . 2n  -  3  predecessors.  During  the  next  n  —  1  successive  time  intervals,  this  node  has  2n  — 

4,2n-5,...,n  —  l,n-2  predecessors.  The  number  of  predecessors  is  given  by: 

n-l  n— 1 

(]^n-2  +  9)  +  (J3n-3  +  g)  =  3n2_8n  +  5  (239) 

9=1  9=1 

There  are  2  multiplications  and  2  additions  associated  with  each  predecessor.  When  the  last  node 
is  removed,  it  only  requires  1  multiplication  and  addition  for  each  of  n  —  1  predecessors. 

For  the  removal  of  n  nodes,  the  number  of  operations  required  is  equivalent  to  the  sum  of 
operations  required  by  the  longest  time  interval.  This  is  the  same  as  adding  all  operations  discussed 


147 


above.  These  numbers  are: 

multiplications:  6n^  —  5n  +  2 

additions:  6n^  —  13n  +  8 
divisions:  2n  -  2 

When  two  vectors  are  being  reversed,  and  neither  is  being  removed,  the  calculations  are 
similar.  This  time,  reverse  an  n-dimensioned  vector  on  the  left  of  the  diagram  with  a  p-dimensioned 
vector  on  the  right.  This  is  the  operation  occurring  when  x(f,)  is  conditioned  on  z(t,).  The  node 
requiring  the  most  operations  is  the  node  being  reversed  that  is  furthest  to  the  right.  It  is  not 
in  the  same  position  as  in  the  previous  calculations  because  there  are  no  nodes  being  removed. 
During  the  first  p—  1  time  intervals  there  are  n  — l,n,n  +  l, . .  .,p+n  — 3  predecessors  for  this  node. 
During  the  next  n  time  intervals,  there  are  p  +  n  —  2,p  +  n  —  3,  ...,p,p— 1  predecessors.  There  are 
2  multiplications  and  2  additions  for  each  predecessor  operation.  The  number  of  multiplications  or 
additions  due  to  predecessors  is: 

p-l  n 

(y^  n  -  2  +  9)  +  p-2  +  q)  =zn^  +  n(4p  -  5)  +  p^  -  5p  +  4  (240) 

<]  =  l  y=l 

The  total  number  of  operations  must  include  operations  needed  to  update  the  conditional 
means,  and  to  calculate  $,  =  (!-  These  were  given  before  in  Appendix  A.  The 

total  of  all  operations  becomes: 

multiplications:  9n^  +  6n(p  -  1)  +  p^  +  1 

additions:  9n^  —  n(6p  —  19)  +  p^  -  4p+ 11 
divisions:  3n  +  p  -  3 
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